Defining a Cancer Dependency Map

Supplemental data

The underlying shRNA log-fold changes, and final gene dependency scores and all other data referenced in the paper can be downloaded from the this other dataset on the Achilles portal.

Cell line characterization data used (mutations, expression, copy-number) can be downloaded from the CCLE portal.

Manuscript Supplementary Tables

Table S1: Sample Info

Metadata about cell lines screened and thier fingerprints. The columns in the SampleInfo sheet are:

Name - cell line name
Primary Disease, Subtype, Primary Site, Primary/Metastasis - all describe the tumor origin of the cell line.
Culture Medium, Conditions - describe the growth conditions and media used for culturing during the screen.
shRNA library - which shRNA library (55k/98k) was used for each cell line screen.
Passage number, Days in culture - both indicate the time of each screen in passages and days
Observed infection rate - rate of lentiviral infection for screen
Observed cell representation - number of infected cells per replicate
Doubling time (hrs) - Doubling time of each cell line in hours
Mean 75th percentile - 75th percentile of shRNA depletion scores, averaged over replicates of a cell line. Used for quality control.
RNASeq mutation rate - Mutation rate from RNASeq data
SNP6_CNfile, RNASeq_EXPfile, - yes/no whether each data type exists for each line
Published - Previous reference for each cell line screen
DEMETER batch - Indicates which DEMETER batch the cell line was a member of.

The FingerPrints sheet contains reference SNP genotypes from Affymetrix SNP6.0 array birdseed calls (for cell lines run on SNP6.0 arrays) or calls from pre-screen reference samples using either Sequenom or Fluidigm assay, for the indicated SNPs by rsid (Reference SNP cluster ID). Calls are 0,1,2, representing homozygous allele A, heterozygous allele AB, homozygous allele B, for each SNP.

Table S2: shRNA Performance

shRNA performance metrics computed by DEMETER. Columns are:

shRNAID - Sequence (ID) of each shRNA construct
Gene.symbol - Target gene symbol 
Seed1.seq, Seed2.seq - Sequences of seed1 and seed2
Gene.sol.Rsquared - R2 of the gene solution's contribution to explaining the variance of this shRNA
Seed1.sol.Rsquared, Seed2.sol.Rsquared - R2of the individual seeds' solution contribution to explaining the variance of this shRNA
Seeds.sol.Rsquared - R2 of both seeds' contribution to explaining the variance of this shRNA
Other.gene.sol.Rsquared - R2 of the other genes targeted by this shRNA other than the gene in the gene.symbol column
Total.Rsquared - R2 of the full model explaining the variance of this shRNA
Alpha.1 - The coefficient on the first seed effect for this shRNA
Alpha.2 - The coefficient on the second seed effect for this shRNA
Beta - The coefficient on the gene effect for the shRNA representing the relative strength of this shRNA
Other.gene.symbol - A list of other genes which are also targeted by this shRNA

Table S3: Dependency Correlations

Pairs of genes with significantly correlated dependency profiles are listed. Pearson correlation coefficient was computed for all pairs of 6,305 genes. Pairs with z-scored correlation coefficients greater than 3 are listed. Columns are:

Gene dependency - Gene dependency profile being correlated
Correlated gene dependency - Gene dependency profile that correlates with the interrogated gene dependency profile
Correlation (r) - Pearson correlation coefficient
z_score - Z-score of the Pearson correlation

Table S4: Per Gene Summary

Information about each gene DEMETER solutions were produced for. Columns are:

Gene dependency - The gene whose dependency is being described 
is.analyzed - Indicates whether each gene is part of the 6,305 genes analyzed further
is.six.sigma - Indicates whether each gene is a 6? dependency
six.sigma.count - The number of cell lines for which each gene is a 6? dependency
targetMin - The minimum gene dependency score across all cell lines
h98.lines.per.gene, h55.lines.per.gene - The number of cell lines each gene was screened in, using the 98k library, 55k library
shrna.98.per.gene, shrna.55.per.gene - The number of shRNAs designed to target each gene in the 98k library, 55k library
is.druggable - Indicates whether each gene was annotated as druggable by either DGIdb or IUPHAR/BPS Guide to Pharmacology
Unbiased - Indicates whether each gene had a significant model based on all features
Related features - Indicates whether each gene had a significant predictive model based on related features
Mutation driven - Indicates whether each gene is classified as a mutation driven dependency
Expression driven - Indicates whether each gene is classified as an expression driven dependency
CYCLOPS - Indicates whether each gene is classified as a CYCLOPS dependency
Paralog dependency - Indicates whether each gene is classified as a paralog dependency
Best.MDP.Rsquared - The weighted R2 of the significant model with the highest R2 
Best.MDP.FDR - The FDR of the significant model with the highest R2
Best.MDP.class - The MDP class of the significant model with the highest R2

Table S5: Mutation Enrichment Analysis

Results from mutation enrichment analysis. Columns are:

Gene - HUGO symbol of gene for which DEMETER dependency values and RNA-seq missense mutations are used to create a 2-by-2 contingency table
DEMETER Threshold - Most negative DEMETER value that results in a fisher exact test p value < .001 when dependency dimension of contingency table is discretized using the threshold value
Odds Ratio - Odds of DEMETER value below threshold in cell lines with missense mutation compared to those without 
Empirical p value - Based on observations of more negative thresholds given global null distribution of 10 million permutations of mutation labels
Q value - Benjamini & Hochberg multiple hypothesis correction of p values

Table S6: MDP Classes

For each MDP class, list of genes that met the critera

MDP feature group - Set of features used as input to obtain predictive model
Gene dependency - Gene whose dependency is being predicted 
Rsquared - The weighted R2 of the model for this gene
FDR - The FDR for the model
Pvalue - The p-value of the model
is six sigma - "TRUE" if this dependency is a 6? dependency

Table S7: Curated Associations

Table of known associations used to verify known relationships are present in this dataset

Marker feature type - Either copy number, expression or mutation       
Gene dependency/marker - The gene tested for itself as a highly correlated marked for dependency 
Direction - direction of correlation  
Rank- rank of correlation     
Number of cell lines with mutation - number of cell lines with a mutation in the marker gene

Software used in analysis

DEMETER v2.20.2
DEMETER deconvolves gene effects from off-target seed effects in RNAi screens. This is the version of DEMETER used in this paper.
ATLANTIS v0.5
ATLANTIS is R pacakge for building random forest like models via the "party" package for biomarker/dependency discovery. This is the version of ATLANTIS used in this paper.