#################################
Supporting Information
for 
Identifying diverse metal oxide nanomaterials with lethal effects on embryonic zebrafish using Machine Learning
R. L. Marchese Robinson,1 H. Sarimveis,2 P. Doganis,2 X. Jia,1 M. Kotzabasaki,2 C. Gousiadou,2 S.L. Harper,3,4,5 T. A.  Wilkins1,*

1.School of Chemical and Process Engineering, University of Leeds, Leeds, LS2 9JT, United Kingdom
2. School of Chemical Engineering, National Technical University of Athens, 9 Heroon Polytechniou str. Zografou Campus, 15780 Athens, Greece
3. School of Chemical, Biological, and Environmental Engineering, Oregon State University, Corvallis, Oregon, USA
4. Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, USA
5. Oregon Nanoscience and Microtechnologies Institute, Eugene, Oregon, USA
*Corresponding author: T.A.Wilkins@leeds.ac.uk
#################################
**************************************
Copyright (c) 2010-2015 ONAMI & Oregon State University 
Copyright (c) 2019-2020 University of Leeds

This file was derived from experimental data records exported from the NBI Knowledgebase [http://nbi.oregonstate.edu/], developed at Oregon State University, using code written at the University of Leeds.

These data are distributed under the terms of the Creative Commons Attribution License Version 4.0	(CC BY 4.0) license. https://creativecommons.org/licenses/by/4.0/.
**************************************

NBIrawConcDepData_plus.pvalues_supp_norm_doseBasedFilt.csv: early processed version of the NBI Knowledgebase dataset containing concentration response dose data, subsequent to automated normalisation and filtering steps, prior to LOEL assignment, along with the p-values reflecting statistically significant differences from the zero dose control group, which were used to assign those LOEL values

***********************
Caveats regarding LOEL assignments
***********************

As explained under "Determination of Statistically Significant LOEL Values" in the manuscript, there were pros and cons of the algorithm used to detect the LOEL values and hence, subsequently, assign the "Toxicity_Status_..." binary classification variables indicating whether a LOEL for the relevant endpoint was (binary classification variable value = 1) or was not (binary classification variable value  = 0) detected from the dose response data. 

In one case (NBI Material Identifier = 214), a LOEL was not detected for excess lethality at 120 hpf, but this is likely a "false negative", because no embryos survived at 24 hpf for the highest dose - meaning that a statistically significant response, in terms of excess lethality at 120 hpf, could not be detected in principle, even though statistically significant responses were observed at lower doses. However, this single case is not expected to have significantly affected the findings reported herein. Switching this label from 0 to 1 and repeating some of the calculations resulted in small or no changes in the overall performance statistics obtained using the Random Forest multi-descriptor model (mean balanced accuracy = 0.70, median balanced accuracy = 0.67, mean MCC = 0.43 median MCC = 0.33, mean AUC = 0.74, median AUC = 0.62) and the Pauling metal atom electronegativity descriptor remained much more highly ranked than all other descriptors according to the default Random Forest and Cforest variable importance measures.



