NeST VNN
1. Park, S., Silva, E., Singhal, A. et al. A deep learning model of tumor cell architecture elucidates response and resistance to CDK4/6 inhibitors. Nat Cancer (2024). https://doi.org/10.1038/s43018-024-00740-1
Cell feature files
Genetic alteration data: a panel of 718 clinical genes was assembled from the union of genes assessed by FoundationOne CDx, Tempus xT, PALOMA-3 trial or Project GENIE, each of which assesses mutations and/or copy number aberrations. To compile genotypes for all cell lines, non-synonymous coding mutations and copy number alterations were extracted for the 718 clinical panel genes from the Cancer Cell Line Encyclopedia. 1
gene2ind.txt:A tab-delimited file where the 1st column is index of genes and the 2nd column is the name of genes.
0 ABCB1 1 ABCC3 2 ABL1
cell2ind.txt:A tab-delimited file where the 1st column is index of cells and the 2nd column is the name of cells (genotypes).
0 201T_LUNG 1 22RV1_PROSTATE 2 2313287_STOMACH
cell2mutation.txt:A comma-delimited file where each row has 718 binary values indicating each gene is mutated (1) or not (0). The column index of each gene should match with those in gene2ind.txt file. The line number should match with the indices of cells in cell2ind.txt file.
0,0,1,0,0,0.. 0,0,0,0,1,0.. 0,0,0,0,0,0..
cell2cndeletion.txt:A comma-delimited file where each row has 718 binary values indicating copy number deletion (1) (0 for no copy number deletion).
0,0,0,0,0,0.. 0,1,0,0,0,0.. 0,0,0,0,1,0..
cell2amplification.txt:A comma-delimited file where each row has 718 binary values indicating copy number amplification (1) (0 for no copy number amplification).
0,0,0,0,0,0.. 0,0,0,1,0,0.. 0,1,0,0,0,0..
Training
Drug response data were obtained by harmonizing the Cancer Therapeutics Response Portal (CTRP) and Genomics of Drug Sensitivity in Cancer (GDSC) databases. 1
The data from the two datasets were harmonized as follows. Drug information: each molecule’s published name, synonym or SMILES (Simplified Molecular Input Line Entry System) string was queried using PubChemPy. The associated InChIKey was extracted and used to identify duplicate drugs (within or between datasets). Cell viability data: for CTRP, the vehicle control-normalized average percent viability files were used. 1
training_data.txt:A tab-delimited file containing all data points that you want to use to train the model. The 1st column is identification of cells (genotypes), the 2nd column is a SMILES string of the drug and the 3rd column is an observed drug response in a floating point number, and the 4th column is source where the data was obtained from.
HS633T_SOFT_TISSUE CC1=C(C(=O)N(C2=NC(=NC=C12)NC3=NC=C(C=C3)N4CCNCC4)C5CCCC5)C(=O)C 0.6695136077442607 GDSC2 KINGS1_CENTRAL_NERVOUS_SYSTEM CC1=C(C(=O)N(C2=NC(=NC=C12)NC3=NC=C(C=C3)N4CCNCC4)C5CCCC5)C(=O)C 0.6444092636032414 GDSC1
hierarchy.cx2:Hierarchy in HCX format used to create a visible neural network.
Prediction
test_data.txt:A tab-delimited file containing all data points that you want to estimate drug response for. The 1st column is identification of cells (genotypes), the 2nd column is a SMILES string of the drug and the 3rd column is an observed drug response in a floating point number, and the 4th column is source where the data was obtained from.
EW24_BONE CC1=C(C(=O)N(C2=NC(=NC=C12)NC3=NC=C(C=C3)N4CCNCC4)C5CCCC5)C(=O)C 0.98852067122827 GDSC1 OCILY7_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE CC1=C(C(=O)N(C2=NC(=NC=C12)NC3=NC=C(C=C3)N4CCNCC4)C5CCCC5)C(=O)C 0.2728634745574858 GDSC1
model_final.pt:The trained model.
Annotation
rlipp.out:File with interpretation scores of the predictions made by VNN model. Disease column is optional.
Term P_rho P_pval C_rho C_pval RLIPP Disease NEST 9.99800e-01 0.00000e+00 9.33000e-01 4.10702e-147 1.07150e+00 Leukemia NEST:6 7.71750e-01 7.47000e-64 7.58600e-01 1.36101e-61 1.01750e+00 Leukemia NEST:58 6.44850e-01 1.44552e-38 6.62900e-01 1.62600e-40 9.73000e-01 Leukemia
hierarchy.cx2:Hierarchy in HCX format that will be annotated with interpretation results that will help determine importance of the subsystems in the hierarchical network.