I. Welcome

1. Foreword

The current online help is aimed at providing essential information on using the OECD QSAR Toolbox for Grouping Chemicals into Categories. The main objective of the Toolbox is to allow the user to use (Q)SAR methodologies to group chemicals into categories and to fill data gaps by read-across, trend analysis and (Q)SARs. For in-depth background information on the concept of chemical categories, the user is invited to consult the guidance document for grouping of chemicals published in the Series on Testing and Assessment of the OECD Environment, Health and Safety Publications [OECD (2007); ENV/JM/MONO(2007)28: http://www.oecd.org/officialdocuments/displaydocument/?doclanguage=en&cote=env/jm/mono(2007)28].

Additional guidance and training material are available on the dedicated internet site for the QSAR Toolbox [http://www.qsartoolbox.org], the internet site for the OECD (Q)SAR Project [http://www.oecd.org/env/existingchemicals/qsar] as well as the internet site of the developer of the QSAR Toolbox [http://toolbox.oasis-lmc.org/]. The user is invited to regularly consult these internet sites.

The QSAR Toolbox is a project of the Organisation for Economic Co-Operation and Development in collaboration with the European Chemical Agency. It has been developed by the Laboratory of Mathematical Chemistry.

2. Acknowledgements

The development of the QSAR Toolbox is a large collaborative effort and many scientific teams and stakeholders are donating their skills and tools to be integrated into the Toolbox [see http://www.oecd.org/env/chemicalsafetyandbiosafety/assessmentofchemicals/donorstotheqsartoolbox.htm]

3. What is the QSAR Toolbox

The OECD QSAR Toolbox is a software designed to reduce the use of animals in laboratory tests, reduce the cost for testing and increase the number of chemicals which are assessed for their effects upon human health and the environment. The OECD QSAR Toolbox provides scientific computational methods and information technologies for application of the category approach for filling gap in experimental data that are necessary for hazard and risk assessment. By making use of the system, hazard and risk assessors are able to:

  o Use predefined categories, or to refine existing or build new categories.

  o Identify analogous chemicals (or category) based on user selected characteristics. Categorize chemicals accounting for their metabolism: rate of disappearance, formation of stable metabolites, formation of high reactive intermediates, deactivation pathways, etc.

  o Extract all available experimental or pre-calculated data from local and remote (web) based databases accompanied with information about their reliability: experimental error, analytical or computational method used, replicates, etc.

  o Fill the gaps of missing information within the category by making use of chemometrics approaches such as read across, trend analysis, and (Q)SAR models.

QSAR predictions are accompanied with information concerning their mechanistic background, training chemicals, statistics, applicability domain and validity.

The OECD QSAR Toolbox is an expandable application that navigates the information flows between all of the installed components (modules): computational tools, database managers, (Q)SAR libraries, categorization models, etc.

4. Abbreviations

II. User interface

The interface of the Toolbox is designed to follow the typical workflow for predicting endpoint(s) for a given chemical (named a target chemical). It represents the main six stages of the workflow (Input, Profiling, Endpoint, Category definition, Data gap filling and Reporting) on a toolbar (1), which is situated on the uppermost part of the application’s window (Figure 1). Below the stages toolbar there is another toolbar – the actions toolbar (2). It provides the most important actions, which are related to the current stage. On the left part of the main window is the stage options panel (3). It provides specific content for the current stage and actions related to this content. The biggest part of the main form is occupied by the data matrix (4). It is available in all stages, except Reporting and shows the queried data, both experimental and predicted for the chemicals loaded into the system.

Figure 1

1. Stages toolbar

The stages toolbar is a steady part of the Toolbox interface. It allows easy navigation between main stages of the program's workflow. Each stage is represented by a toolbar button, which invokes the interface related to the current stage. Some examples are provided below. (Figure 2)

Figure 2

The toolbar button (1) and the interface (2) for the stage "Input" (Figure 2).

Figure 3

The toolbar button (1) and the interface (2) for the stage "Category definition" (Figure 3).

Figure 4

The toolbar button (1) and the interface (2) for the stage "Data Gap Filling" (Figure 4).

2. Actions toolbar

The actions toolbar provides the basic actions for the stages of the program's workflow. Each stage has its specific actions; this is why the content of the toolbar varies between stages. For the users convenience the actions may be divided into groups. Some examples are provided below (Figure 5):

Figure 5

The basic actions of the stage "Input" (Figure 5).

Figure 6

The basic actions of the stage "Endpoint" (Figure 6)

3. Stage options panel

The stage option panel provides specific content for the current stage and actions related to this content. Each stage has its specific functions and that is why the stage option panel has different content. Some examples are provided below:

Input: The stage option panel in the Input stage gives the list with work documents, content of the documents. It also provides two approaches for multiplication of the target structures – multiplication by tautomerism and multiplication by metabolism.

Metabolism can be applied via 3 observed and 9 simulated metabolism simulators:

• Observed:

      • Observed Liver metabolism

     • Observed Mammalian metabolism

     • Observed Microbial metabolism

• Simulated:

    • Autoxidation simulator

    • Dissociation simulator

    • Hydrolysis (Acidic)

    • Hydrolysis (Acidic)

    • Hydrolysis (Acidic)

    • Liver metabolism

    • Microbial metabolism simulator

    • Skin metabolism simulator

Multiplication by Metabolism – select Skin metabolism simulator

In order to accomplish multiplication of the loaded target structure the user should apply a right click on the SMILES of the target in the stage option panel (1), select Multiplication from the pop-up menu (2), then press Metabolism (3) and finally select a simulator, for example Skin metabolism simulator (4). (Figure 7)

Figure 7

The same procedure could be applied in order to multiply the structure by one of the three observed metabolic simulators.

Multiplication by Tautomerism

Apply steps 1-3, but during the step 3 select Tautomerism instead of Metabolism. (Figure 8)

Figure 8

Double click on the tautomeric/metabolic (1) set to invoke a window displaying the pictures of the set’s constituents. (Figure 9)

Figure 9

4. Data matrix

Below is a snapshot displaying the data matrix window (1). (Figure 10)

Figure 10

The data matrix window has three main parts (Figure 11):
    • Area with the Endpoint tree (1)
    • Area with the selected chemicals (2)
   • Area with data (experimental, predicted) (3)

Figure 11

4.1. Endpoint tree area

  4.1.1. Content of the Endpoint tree area:
       • Nodes of the Endpoint tree
       • Construction
       • Set tree hierarchy
       • Filtering nodes of the Endpoint tree
       • Sorting and filtering data
       • Tips Nodes of Endpoint tree:
      The Endpoint tree has five general nodes:
        • Substance Identity
        • Physical Chemical Properties
        • Environmental Fate and Transport
        • Ecotoxicological Information
        • Human Health Hazard

The Substance Identity includes subnodes displaying the substance information of the selected chemical(s) like CAS number; Chemical IDs; Chemical Names and Structural formula (Figure 12):

Figure 12

Experimental data available in Toolbox databases is assigned to the other four general nodes and their subnodes. These four nodes are separated in four basic sections depending on the type of the assigned experimental data.

For example results for melting point or partition coefficients (Figure 13) are assigned to the nodes Melting/Freezing Point or Partition Coefficient, which are sub-nodes of the node Physical Chemical Properties (1),

Figure 13

or data associated with the Ames test or Chromosomal aberration are assigned to the nodes Bacterial Reverse Mutation Assay (e.g. Ames Test) and In Vitro Mammalian Chromosome Aberration Test, which are subnodes of the node Human Health Hazards (2) (Figure 14)

Figure 14
 Construction of Endpoint tree

The Endpoint tree is constructed in two parts: a predefined and a dynamic part. The predefined part is rigid and cannot be reordered while the dynamic part is flexible and can be reordered. This functionality is implemented due to the diversity of experimental data available from different databases. To check which part is predefined and which part is dynamic you should press the Ctrl key and the predefined part of the tree will be underscored.

• Predefined part (Figure 15)

Figure 15

Dynamic part (Figure 16)

Figure 16

The metadata fields associated with the experimental data is used to build the dynamic part of the endpoint tree. So in this case in vitro and in vivo are elements of the metadata field called Type of method (1) (Figure 17)

Figure 17

The next node Bacterial reverse mutation assay (e.g Ames test) is the Test type (2) (Figure 18)

Figure 18

The subsequent two nodes Gene mutation and Salmonella typhimurium are associated with the field Type of genotoxicity and the field Test organism (species) (3). (Figure 19)

Figure 19

The last two nodes are associated with the following two fields: Metabolic activation and Strain (4). (Figure 20)

Figure 20

Each of these fields can be reordered using the Set tree hierarchy functionality. This option is available by applying a right mouse click over the node where the corresponding hierarchy should be reordered (1) and then clicking on Set tree hierarchy (2) from the context menu. The little blue triangle appears on the level of the node to which a hierarchy is set (3). (Figure 21)

Figure 21

The Set tree hierarchy window appears. Set tree hierarchy functionality

It contains two panels: Metadata labels (1) and Sub-nodes (2) (Figure 22)

Figure 22

The Toolbox comes with default hierarchy. The panel with Metadata labels contains a list with most usable fields. If the user wants to set another field as a sub-node he/she should check the Show all labels box (1), then the list with all available labels available in different databases appears (2). (Figure 23)

Figure 23

The sequence of fields (1) displayed in the Sub-nodes panel specifies the organization of the nodes of the endpoint tree (2). (Figure 24)

Figure 24

The user can add or remove fields already specified as sub-nodes using the auxiliary buttons (1) (Figure 25)

Figure 25

The user can reorder the sub-nodes using the Up and Down buttons (2) (Figure 26)

Figure 26

If the user wants to reset the default setting of the endpoint tree then he/she can click the Default button (3). (Figure 27)

Figure 27

The changes in the endpoint hierarchy are confirmed by pressing the OK button. Filtering nodes of the Endpoint tree

The nodes of the endpoint tree can be filtered using the Filter endpoint tree… functionality. In order to filter the endpoint tree, the user should write the desired query in the blank field named Filter endpoint tree…(1) then the white field Filter endpoint tree…becomes green colored (2) indicating that the endpoint tree is filtered (3) (for instance write “skin”, then the endpoint tree is filtered and only nodes related to skin are visible) (Figure 28 -29)

Figure 28

Figure 29

When the user deletes the defined query then the system restore the default settings of the endpoint tree. Sorting and filtering data assigned to a defined node 

There is a functionality which allows sorting experimental data for a given row displayed in the data matrix. The user should right click (1) over the node with data which is the object of filtering and then select one of the following options:

        • Sort (targets priority) – by this option the chemicals are sorted by experimental data into descending or ascending order, taking into account the priority of the target chemical (2). The latter means that the target will stay in the first

          column and the other chemicals will be placed after the target in descending or ascending order.(Figure 30)

Figure 30

       • Sort – by this option chemicals are sorted in descending or ascending order without taking into account the priority of the target chemical (3). (Figure 31)

Figure 31

     • Function – this functionality displays the minimal, maximal or average values if more than one experimental data are available for a chemical. This functionality works for data on a given row (4). (Figure 32)

Figure 32
 Tips related to the Endpoint tree area

Some additional features are available by applying a right mouse click over the area of the endpoint tree

        • Hidden nodes

        The functionality to view hidden nodes of the endpoint tree is available. The 2D and 3D parameters are hidden nodes. They are listed in two separate nodes. To visualize the list with parameters, the user should right click over the endpoint tree and select Show hidden (1). (Figure 33)

Figure 33

Then a list with nodes with 2D and 3D parameters appears (1). Hidden nodes are in blue font. (Figure 34)

Figure 34

The parameters are listed as subnodes 2D and 3D. Calculating the desired parameter is possible when the user clicks the right mouse button over the desired parameter (1) and selects one of the available options: Calculate /Extract for all chemicals or Calculate all parameters (2). These two options are used for the calculation of the selected parameter for all chemicals loaded in the data matrix. (Figure 35)

Figure 35

Calculation of a parameter for one specific chemical is possible when user hovers over the cell of the chemical corresponding to the desired parameter (1) and from the popup menu (right mouse click) (2) selects one of the options (3): Calculate ….. or Calculate/Extract all 2D parameters. (Figure 36)

Figure 36

       • Supporting functionalities

          o Collapse all – this option allows to collapse all expanded nodes on the endpoint tree (1) (Figure 37)

Figure 37

         o Export – this option allows to export data from a row of the data matrix (1). (Figure 38)

Figure 38

       o Export CAS list – this option allows to export a list with CAS numbers of chemicals loaded in the data matrix (1) (Figure 39)

Figure 39

        o Wiki search species – allows to search for test organisms in Wikipedia (1). (Figure 41)

Figure 40

        o Copy path – this option allows to copy the endpoint path (1) (Figure 41)

Figure 41

4.2. Area with selected chemicals (2)

The chemicals which are loaded in the system appear in the data matrix ordered in separate columns. There are identification labels for tautomers and mixtures. The identification label for tautomers is “T” (1) (Figure 42), while the mixtures are labeled with “Mix” (2). (Figure 43)

Tautomers label “T”

Figure 42

Mixture label “Mix”

Figure 43

A tautomeric set is indicated by the tautomeric label “T” (1) and the number of tautomers which belong to the tautomeric set (2). (Figure 44)

Figure 44

All tautomers from a given tautomeric set can be visualized with a double click over the field with the molecular structure (1).A window with all available tautomers opens (2): (Figure 45)


Figure 45

Assigned label “A”

Label “A” (1) assigned to a chemical structure means that the experimental data assigned to the target chemical are extracted from chemicals with same as the target chemical structure for which there is only one CAS number is assigned in the Toolbox databases regardless of whether it is a component of a mixture, metabolite or tautomer.

For example for the chemical with the label “A” shown in the picture below, a component from the mixture has ecotox data (1). (Figure 46)

Figure 46

MCAS label

Label “MCAS” is assigned to chemicals (1) for which the experimental data are assigned from chemicals with more than one CAS and the same structure. (Figure 47)

Figure 47

The user could see the source chemicals of assigned data by right clicking over the chemical with MCAS (1) label and select Expand by CAS (2) from the context menu that appears. (Figure 48)

Figure 48

There is a filter option here, which allows ignoring specific chemical(s) with experimental data. This is possible when the user right clicks (1) over the area of the chemical which should be ignored. The user should right click somewhere in the red enclosed area (1). (Figure 49)

Figure 49

After this operation the filtered chemical is black colored (1) and associated experimental data is subsequently excluded from the workflow. (Figure 50)

Figure 50

4.3. Area with data (3)

There are tree labels for data visualized on data matrix:
   1) “M” – means measured data, extracted from databases (1)
   2) “R” – means prediction result obtained as a result of read-across analysis (2)
   3) “T” – means prediction result obtained as a result of trend analysis (3)
   4) “Q” – means prediction result obtained as a result of QSAR prediction (4)
   5) “CI” – means prediction result obtained from component based Independent MOA
   6) “CS” – means prediction result obtained from component based Similar MOA
  7) “CM” – means prediction obtained from component based Specific MOA