The underlying shRNA log-fold changes, and final gene dependency scores and all other data referenced in the paper can be downloaded from the this other dataset on the Achilles portal.
Cell line characterization data used (mutations, expression, copy-number) can be downloaded from the CCLE portal.
Metadata about cell lines screened and thier fingerprints. The columns in the SampleInfo sheet are:
Name - cell line name Primary Disease, Subtype, Primary Site, Primary/Metastasis - all describe the tumor origin of the cell line. Culture Medium, Conditions - describe the growth conditions and media used for culturing during the screen. shRNA library - which shRNA library (55k/98k) was used for each cell line screen. Passage number, Days in culture - both indicate the time of each screen in passages and days Observed infection rate - rate of lentiviral infection for screen Observed cell representation - number of infected cells per replicate Doubling time (hrs) - Doubling time of each cell line in hours Mean 75th percentile - 75th percentile of shRNA depletion scores, averaged over replicates of a cell line. Used for quality control. RNASeq mutation rate - Mutation rate from RNASeq data SNP6_CNfile, RNASeq_EXPfile, - yes/no whether each data type exists for each line Published - Previous reference for each cell line screen DEMETER batch - Indicates which DEMETER batch the cell line was a member of.
The FingerPrints sheet contains reference SNP genotypes from Affymetrix SNP6.0 array birdseed calls (for cell lines run on SNP6.0 arrays) or calls from pre-screen reference samples using either Sequenom or Fluidigm assay, for the indicated SNPs by rsid (Reference SNP cluster ID). Calls are 0,1,2, representing homozygous allele A, heterozygous allele AB, homozygous allele B, for each SNP.
shRNA performance metrics computed by DEMETER. Columns are:
shRNAID - Sequence (ID) of each shRNA construct Gene.symbol - Target gene symbol Seed1.seq, Seed2.seq - Sequences of seed1 and seed2 Gene.sol.Rsquared - R2 of the gene solution's contribution to explaining the variance of this shRNA Seed1.sol.Rsquared, Seed2.sol.Rsquared - R2of the individual seeds' solution contribution to explaining the variance of this shRNA Seeds.sol.Rsquared - R2 of both seeds' contribution to explaining the variance of this shRNA Other.gene.sol.Rsquared - R2 of the other genes targeted by this shRNA other than the gene in the gene.symbol column Total.Rsquared - R2 of the full model explaining the variance of this shRNA Alpha.1 - The coefficient on the first seed effect for this shRNA Alpha.2 - The coefficient on the second seed effect for this shRNA Beta - The coefficient on the gene effect for the shRNA representing the relative strength of this shRNA Other.gene.symbol - A list of other genes which are also targeted by this shRNA
Pairs of genes with significantly correlated dependency profiles are listed. Pearson correlation coefficient was computed for all pairs of 6,305 genes. Pairs with z-scored correlation coefficients greater than 3 are listed. Columns are:
Gene dependency - Gene dependency profile being correlated Correlated gene dependency - Gene dependency profile that correlates with the interrogated gene dependency profile Correlation (r) - Pearson correlation coefficient z_score - Z-score of the Pearson correlation
Information about each gene DEMETER solutions were produced for. Columns are:
Gene dependency - The gene whose dependency is being described is.analyzed - Indicates whether each gene is part of the 6,305 genes analyzed further is.six.sigma - Indicates whether each gene is a 6? dependency six.sigma.count - The number of cell lines for which each gene is a 6? dependency targetMin - The minimum gene dependency score across all cell lines h98.lines.per.gene, h55.lines.per.gene - The number of cell lines each gene was screened in, using the 98k library, 55k library shrna.98.per.gene, shrna.55.per.gene - The number of shRNAs designed to target each gene in the 98k library, 55k library is.druggable - Indicates whether each gene was annotated as druggable by either DGIdb or IUPHAR/BPS Guide to Pharmacology Unbiased - Indicates whether each gene had a significant model based on all features Related features - Indicates whether each gene had a significant predictive model based on related features Mutation driven - Indicates whether each gene is classified as a mutation driven dependency Expression driven - Indicates whether each gene is classified as an expression driven dependency CYCLOPS - Indicates whether each gene is classified as a CYCLOPS dependency Paralog dependency - Indicates whether each gene is classified as a paralog dependency Best.MDP.Rsquared - The weighted R2 of the significant model with the highest R2 Best.MDP.FDR - The FDR of the significant model with the highest R2 Best.MDP.class - The MDP class of the significant model with the highest R2
Results from mutation enrichment analysis. Columns are:
Gene - HUGO symbol of gene for which DEMETER dependency values and RNA-seq missense mutations are used to create a 2-by-2 contingency table DEMETER Threshold - Most negative DEMETER value that results in a fisher exact test p value < .001 when dependency dimension of contingency table is discretized using the threshold value Odds Ratio - Odds of DEMETER value below threshold in cell lines with missense mutation compared to those without Empirical p value - Based on observations of more negative thresholds given global null distribution of 10 million permutations of mutation labels Q value - Benjamini & Hochberg multiple hypothesis correction of p values
For each MDP class, list of genes that met the critera
MDP feature group - Set of features used as input to obtain predictive model Gene dependency - Gene whose dependency is being predicted Rsquared - The weighted R2 of the model for this gene FDR - The FDR for the model Pvalue - The p-value of the model is six sigma - "TRUE" if this dependency is a 6? dependency
Table of known associations used to verify known relationships are present in this dataset
Marker feature type - Either copy number, expression or mutation Gene dependency/marker - The gene tested for itself as a highly correlated marked for dependency Direction - direction of correlation Rank- rank of correlation Number of cell lines with mutation - number of cell lines with a mutation in the marker gene