Systematic Approaches for the Encoding of Chemical Groups: A Case Study
The Supporting Information contains the following material: data set with ARN groups downloaded from https://echa.europa.eu/assessment-regulatory-needs in Feb 2023 (S1_2023_02_03_assessment-of-regulatory-needs--arn─export.xlsx)
Curated data set with ARN groups with molecular structures and their quality scores, that was used for building the models (S2_ARN_groups.xlsx)
Descriptive statistics for the 86 substance groups (S3_ARN_stats.xlsx). For each group, we provide the number of substances as in the ARN group, the number of substances matched in DSSTox, the DSSTox substance type and the number of substances with structural information and its quality; document with additional figures and explanations referred to in the manuscript (S4_SystematicGroupingSI.docx)
Predicted groups, probabilities and domain assessment for all nonconfidential substances registered under REACH (S5_rf_application_1_results_redacted.xlsx); Cross-validation scoring results obtained in every iteration of outer and inner grid search for the random forest (RF) model (S6_outer_inner_grid_details_rf.xlsx)
Cross-validation scoring results obtained in every iteration of outer and inner grid search for the nearest neighbor (kNN) model (S7_outer_inner_grid_details_kn.xlsx)
Cross-validation scoring results obtained for the gradient boosting (GB) model. This data set is only provided for completeness because the GB model was evaluated but not used further. Due to the computational cost, we only performed the inner grid search using the optimal fingerprint parameters identified by the outer grid search with kNN and RF (radius 2, length 2,560) (S8_outer_inner_grid_details_gb.xlsx) (ZIP)
Complete Metadata
| bureauCode |
[ "020:00" ] |
|---|---|
| identifier | https://doi.org/10.23719/1530990 |
| programCode |
[ "020:000" ] |
| references | null |
| rights | null |