Skip to main content
U.S. flag

An official website of the United States government

This site is currently in beta, and your feedback is helping shape its ongoing development.

Systematic Approaches for the Encoding of Chemical Groups: A Case Study

Published by U.S. EPA Office of Research and Development (ORD) | U.S. Environmental Protection Agency | Metadata Last Checked: August 02, 2025 | Last Modified: 2024-03-01
The Supporting Information contains the following material: data set with ARN groups downloaded from https://echa.europa.eu/assessment-regulatory-needs in Feb 2023 (S1_2023_02_03_assessment-of-regulatory-needs--arn─export.xlsx) Curated data set with ARN groups with molecular structures and their quality scores, that was used for building the models (S2_ARN_groups.xlsx) Descriptive statistics for the 86 substance groups (S3_ARN_stats.xlsx). For each group, we provide the number of substances as in the ARN group, the number of substances matched in DSSTox, the DSSTox substance type and the number of substances with structural information and its quality; document with additional figures and explanations referred to in the manuscript (S4_SystematicGroupingSI.docx) Predicted groups, probabilities and domain assessment for all nonconfidential substances registered under REACH (S5_rf_application_1_results_redacted.xlsx); Cross-validation scoring results obtained in every iteration of outer and inner grid search for the random forest (RF) model (S6_outer_inner_grid_details_rf.xlsx) Cross-validation scoring results obtained in every iteration of outer and inner grid search for the nearest neighbor (kNN) model (S7_outer_inner_grid_details_kn.xlsx) Cross-validation scoring results obtained for the gradient boosting (GB) model. This data set is only provided for completeness because the GB model was evaluated but not used further. Due to the computational cost, we only performed the inner grid search using the optimal fingerprint parameters identified by the outer grid search with kNN and RF (radius 2, length 2,560) (S8_outer_inner_grid_details_gb.xlsx) (ZIP)

Find Related Datasets

Click any tag below to search for similar datasets

Complete Metadata

data.gov

An official website of the GSA's Technology Transformation Services

Looking for U.S. government information and services?
Visit USA.gov