Category definition : QSAR Toolbox Helpdesk

A. Basic information

This module provides the user with several means of grouping chemicals into a toxicologically meaningful category that includes the target molecule. This module is based on grouping methods that allow the user to group chemicals into chemical categories according to different measures of “similarity” so that within a category data gaps can be filled by read-across or trend analysis. This is the critical step in the workflow and several options are available in the Toolbox to assist the user in refining the category definition via subcategorization.

For example within a large inventory, the chemicals can be grouped according to their aquatic toxicity mode of action. Or, starting from a target chemical for which a specific mechanism of action is identified, analogues can be found which can bind by the same mechanism and for which experimental results are available. If no specific mechanisms or modes of action are identified for a target chemical, which are relevant for the investigated endpoint, then it is recommended to search for chemicals which are structurally similar to the target chemical. The search results can then be refined by eliminating those chemicals which have specific mechanisms or mode of action.

B. Grouping methods

List with grouping methods covers the list with profiling methods (Table 1):

Table 1. List with profiling and grouping methods

Summary background information for some grouping methods is listed in Table 2.

Table 2. Summary information for some grouping methods:

When searching for analogues of a target chemical, the outcome of the profiling determines the most appropriate way. The following recommendations can be made:

• If specific mechanisms or modes of action are identified for a target chemical, which are relevant for the investigated endpoint, then it is recommended to search for chemicals which have the same mechanisms or modes of action. The search results can then be refined by eliminating those chemicals which are structurally most dissimilar.

• If no specific mechanisms or modes of action are identified for a target chemical, which are relevant for the investigated endpoint, then it is recommended to search for chemicals which are structurally similar to the target chemical. The search results can then be refined by eliminating those chemicals which have specific mechanisms or modes of action.

It should be kept in mind that the search for analogues is performed among the chemicals, which are listed in the selected Databases or inventories listed under Inventories. For example if only the databases “Skin sensitisation ECETOC” and “Skin sensitisation” are selected, the analogue search will only be performed among those chemicals for which experimental data on skin sensitisation are available in those databases. Similarly, the user can decide to expand the search to chemical inventories. For example by selecting the databases “Carcinogenicity &Mutagenicity ISSCAN” and “Genotoxicity OASIS” as well as the inventory “Canada DSL”, the Toolbox will query for analogues in those two databases as well as this specific inventory.

The inventories contain between 5 000 and 100 000 substances. In order to accelerate the process of identifying the similar analogues only the databases are preliminary profiled and indexed (2D and 3D calculations). In this respect if one is to include one of them in the query this will produce longer calculation times.

C. Building categories – principles

Recommendations

When selecting databases and/or inventories to apply category definitions, the following recommendations apply:

• If the aim of the user is to find only analogues for which experimental data are available on specific endpoints, then only those databases that contain results on those endpoints should be selected. No inventory should be selected.

• First step of categorization procedure is to find more structurally similar analogues using non-endpoint specific grouping methods. In this respect structurally based grouping methods which will define more broader group is recommended to be used such as:

o US EPA Categorization

o OECD Categorization

o Organic functional group

o Structural similarity

o ECOSAR

• Second step is to refine the broader group using subcategorization procedure. In this step mechanistically based and endpoint specific grouping methods can be used:

o DNA binding mechanism

o Protein binding mechanism

o Genotoxicity/carcinogenicity

o Cramer rules

o Verhaar rule

o Skin/eye irritation corrosion rules

• Final step of subcategorization – apply first step of categorization

D. Defining categories

1. Procedure for defining categories

The creation of a category is straightforward – the user selects a grouper (1) and presses the Define button (2) (Figure 1)

Figure 1. Category definition

When the grouping method executes it will provide the user with the all the categories in the selected grouping method and the categories of the target chemical (1) (if any) will populate the Target(s) profiles (2) list box. (Figure 2)

Figure 2.Target(s) profiles panel

On this stage the user has opportunity to:

• remove targets categories by selecting one of them (1) and moving it in the All profiles (2) panel using the down arrow (Figure 3)

Figure 3

• add categories to the selection of targets profiles by selecting one of them (1) from the All profiles panel (2) and moving it to the Target(s) profiles (4) panel pressing the up arrow. (Figure 4)

Figure 4

• Combine profiles logically – this allows the user to chose how to combine targets profiles using logically And/Or operand. If And is chosen then all selected categories have to be presented in the searched chemical (s). If Or is chosen then only one of the selected category is enough. (Figure 5)

By default the AND is selected. (Figure 5)

Figure 5

• Invert results (1) – this function searches for chemicals which have profiles different than the targets profiles. On the screenshot below (Figure 6), if the is classified as Acrylates/Metacrylates (Acute toxicity) and Esters (Acute toxicity) and Inverts results is selected, then the software will search for chemicals having profiles different than the above mentioned.

Figure 6

• Strict option (2) (Figure 6) - If Strict is checked then only the defined categories should be present and not any other.

After closing the window for defining the category a category is built and a Define category name dialog appears (Figure 7). The user could change the name or leave it as it is.

Figure 7

The software identified 24 chemicals from the selected database(s) with same profiles as those of the target chemical. After defining the category name, the software automatically commences a gather data action. The user can select the specific endpoint (Choose…) or by default choose to retrieve data for all endpoints (All endpoints) (see below) (Figure 8). If the user has previously selected databases related to the investigated endpoints, then both options will return same results.

Figure 8

If the user has selected all databases under the Endpoint section, and selects All endpoint the gather data operation could be very time consuming due to the diversity of endpoints and size of databases. In this respect the user is recommended to always select only those databases, and endpoint paths, which are related to the investigated endpoints.

After confirming which data to be read from databases then Repeating values dialog appears (Figure 9). This window appears due to same measured data being found for chemicals from the Toolbox databases. Data redundancies are identified and the user has the opportunity to select which data to leave and which to filter out. Buttons to select a single data (1) value or all data values (2) are also available.

Figure 9

The only difference between rows for a given chemicals is in the information for some of the metadata fields. For the case study shown on Figure 10, first chemical has same data values 4.1.105 micrograms per liter and different metadata information for “Age” field (1) (Figure 10)

Figure 10

Finally after reading data, the defined category appears under Defined Categories panel. (Figure 11)

Figure 11

2. Categorization of single chemical

Categorizing of single chemical is explained in the section Procedure for defining categories

3. Categorization of set of chemicals

Toolbox has opportunity to categorize set of chemicals. These sets could be:

• Set of tautomers

• Set of mixtures

• Set of parent and metabolites

Categorization of tautomers set

When the user categorizes a tautomeric set in Set mode (1) all profiles of tautomeric forms of given chemical are taken into account (2). (Figure12)

Figure 12

In the Categorization panel profiles of all tautomers are taken into account (Figure 13):

Figure 13

Categorization of set of mixtures

When the user categorizes a set of mixtures all profiles of component of the mixture are taken into account (1) (Figure 14)

Figure 14

And in the Categorization panel all profiles of component of the mixture (s) are taken into account for Targets (s) profile (1) (Figure 15)

Figure 15

Categorization of set of parent and metabolites

When the user categorized a set of parent chemical and its metabolites, all profiles of parent and metabolites are taken into account (1) (Figure 16)

Figure 16

In the Categorization panel profiles of all metabolites along with those of the parent are taken into account (1) (Figure 17)

Figure 17

4. Categorization using profiling result of hierarchical type profiling scheme:

Profiling results from hierarchical schemes such as Protein and DNA binding give information for Domain, Mechanistic alert and Structural alerts. Profiling results are visualized hierarchically: (Figure 18)

Figure 18

Toolbox gives opportunity to define a category using each of these profiling results. In case the user applies category “Domain” (Figure 19) for categorization proposes, then the software will search for chemicals which answer the criteria of category “Domain” (e.g. SN2). SN2 includes following categories:

Figure 19

So the defined category will include chemicals classified in one of these (or all together) Mechanistic alerts shown above depending on the logically operant used in defining categories The defined category “SN2” will be broader (will include more chemicals) than the category “Nucleophilic substitution at Nitrogen atom” (Mechanistic alert), which includes only three structural alerts. (Figure 20):

Figure 20

Procedure for categorization

If the user defines category using hierarchical type grouping method, then he/she is allowed to use Domain, Mechanistic alert and Structural alert separately or simultaneously in the categorization procedure. (Figure 21)

The user can remove category related to “Structural alert”, by selecting the category (1) and moving it down (Figure 21) or he/she can leave it as is (by default):

Figure 21

In case the software doesn’t identify analogues which answer the criteria of Domain, Mechanistic alert and Structural alert categories combined by AND, then the following message will appear (Figure 22):

Figure 22

Then the user could expand the category by removing the more specific category (Structural alert) (Figure 23), and use the remaining Domain and Mechanistic alert categories.

Figure 23

E. Subcategorization

The second step of refining the broader category and defining the category of structurally and mechanistically similar analogues is the subcategorization procedure. The user can verify the mechanistic robustness of the analogue approach. If the identification of analogues was performed according to a specific mechanism or mode of action, then the target chemical and the analogues will already have the same relevant mechanisms and modes of action. Nevertheless, the analogues may also have additional mechanisms and modes of action due to additional functional groups in their molecule. In this respect subcategorization procedure is applied to refine the categories (eliminating dissimilar structures).

The broader category can be refined when subcategorization is applied. For example 216 esters are identified by US-EPA category for chemical with CAS (1) (Figure 24)

Figure 24

When the category is defined the user can check if all these 216 esters have same DNA mechanism of interaction. In this respect the subcategorization procedure is for can be used for checking that all these 216 chemical have same DNA binding mechanism. Subcategorization panel includes list with same grouping/profiling methods as listed in Profiling section. In the subcategorization procedure metabolism could be accounted for. After the category is defined the user has to click Subcategorize button. As the category of US-EPA is defined (1) in this particular case, click Subcategorize button (2), and select one of the DNA binding profilers (3) (Figure 25)

Figure 25

Chemical(s)/analogue(s) which have different mechanism of interaction than of the target chemical are highlighted by a blue background. Others which are not highlighted have at least one category same as those of the target.

If the user selects all categories radio button (1) (Figure 26), then all analogues included in the refined category should have all categories combined by logical conjunction (AND) (2)

Figure 26

After removing the dissimilar analogues by clicking on Remove button (1), the software defines a new category, which is subcategory of the parent category (2) (Figure 27)

Figure 27

Subcategory appears as a sub-node of the first category (1). (Figure 28) When the user selects the subcategory a new datamatrix with chemicals included in it appears (2) (Figure 28)

Figure 28

F. Combine categories

Combining categories defines a new category by combining logically members from already existing categories. The user has to click Combine button (1) then the chemicals from defined two or more categories (2) could be combined by logically AND/OR (3) operand (Figure 29)

Figure 29

After selection of the combination logic the software defines a new category including chemicals which answer the criteria of the defined category. (Figure 30)

Figure 30

G. Clustering of categories

This function allows distributing defined category into clusters. Cluster is presented as a sub-category of general category that includes chemicals with unique combination of profiling results. Clusters appear as sub-categories of general category.

How to cluster the category?

Once the category is defined (1), the user has to click Clustering button (2), then a message with numbers of generated clusters appears (3). (Figure 31)

Figure 31

Clusters of category appear as sub-nodes (1) of the general category (2). (Figure 32)

Figure 32

By clicking over the cluster (1) a new matrix with the chemicals of the current cluster appears (2). (Figure 33)

Figure 33

H. Delete category

1. Delete single category

Delete button allows the user to delete selected category. The user has to select the category to be deleted (1) and then click on the Delete button (2) (Figure 34)

Figure 34

2. Delete all categories

The user has opportunity to delete all defined category simultaneously using the Delete all (1) button. All defined categories (2) are deleted. (Figure 35)

Figure 35

After deleting all categories, only the target chemical remains on data matrix (Figure 36)

Figure 36

QSAR Toolbox Helpdesk

How can we help you today?

Category definition Print

How can we help you today?

Category definition Print

Related Articles