Training and testing multiple models
Jupyter Notebook Example
The notebook example provided in GitHub repository demonstrates how to use cellmaps_vnn with five distinct training datasets. Each trained model is used to generate predictions, and the resulting system importance scores are aggregated and visualized on the hierarchy.
Script Example
Use case: You want to train and test n models for p drugs with k configurations each.
Example: You have two drugs and want to train and test models with 5 different configurations. You can create a directory for each drug with multiple configuration files for each. Here we have a data directory that contains the hierarchy used to build the VNN model, and two directories drug1 and drug2 that contain configuration files.
Inputs:
data/
├── hierarchy.cx2
└── data_for_drug1/
└── data_for_drug2/
└── drug1/
├── config_file1.yaml
├── config_file2.yaml
├── config_file3.yaml
├── config_file4.yaml
└── config_file5.yaml
└── drug2/
├── config_file1.yaml
├── config_file2.yaml
├── config_file3.yaml
├── config_file4.yaml
└── config_file5.yaml
Below is the Bash script that automates the process of training and testing n models for p drugs with k configurations each. It organizes outputs in a new directory structure with subdirectories for each drug.
#!/bin/bash
DATA_DIR="data" # hierarchy.cx2 should be placed in this directory
RESULTS_DIR="results"
DRUGS=("drug1" "drug2") # List of drugs and directory names inside DATA_DIR
CONFIG_FILES=("config_file1.yaml" "config_file2.yaml") # Config files
mkdir -p "$RESULTS_DIR"
# Loop through drugs
for drug in "${DRUGS[@]}"; do
echo "Processing $drug..."
mkdir -p "$RESULTS_DIR/$drug"
# Loop through configuration files
for i in "${!CONFIG_FILES[@]}"; do
config_file="${CONFIG_FILES[$i]}"
config_index=$((i + 1))
# Define output directories
train_outdir="$RESULTS_DIR/$drug/${drug}_${config_index}_train"
test_outdir="$RESULTS_DIR/$drug/${drug}_${config_index}_test"
# Training command
echo "Training $drug with $config_file..."
cellmaps_vnncmd.py train "$train_outdir" --inputdir "$DATA_DIR" --config_file "$DATA_DIR/$drug/$config_file" --slurm --use_gpu
train_job_name="${drug}_${config_index}_train"
train_job_id=$(sbatch --parsable --job-name="$train_job_name" "$train_outdir/cellmapvnntrainjob.sh")
# Testing command (dependent on training)
echo "Testing $drug with $config_file..."
cellmaps_vnncmd.py predict "$test_outdir" --inputdir "$train_outdir" --config_file "$DATA_DIR/$drug/$config_file" --slurm --use_gpu
test_job_name="${drug}_${config_index}_test"
sbatch --dependency=afterok:$train_job_id --job-name="$test_job_name" "$test_outdir/cellmapvnnpredictjob.sh"
echo "Completed $drug configuration $config_index."
done
done
echo "All training and testing processes initiated!"
Outputs:
results/
└── drug1/
├── drug1_1_train/
├── drug1_1_test/
├── drug1_2_train/
├── drug1_2_test/
└── ...
└── drug2/
├── drug2_1_train/
├── drug2_1_test/
├── drug2_2_train/
├── drug2_2_test/
└── ...