Frequently Asked Questions
How is the probability of dependency different from the gene effect score?
The “gene effect” file contains the corrected CERES scores, which measure the effect size of knocking out a gene, normalized against the distributions of non-essential and pan-essential genes. The probabilities assess, given a gene score, how likely to be a member of the non-essential distribution or the common essential distribution in that cell line. The key difference between using a fixed threshold on CERES score and a threshold on the probabilities is that the probabilities take into account the screening quality, which varies from line to line.
So which one should I use?
Depending on the question you want to ask, you may want to choose to use one measure or the other. For cases where you are interested in potentially subtle variation in the strength of killing, such as computing co-dependency correlations, using the CERES scores makes sense. However, if you are only interested in binary relationships of which lines are killed or not, for example, when looking for biomarkers which classify lines into sensitive or insensitive, then the dependency probabilities may make more sense to use.
I’m a computationalist and I want gene scores with no copy number corrections or other fancy processing. How can I get them?
Starting with the matrix logfold_change, you can use guide_gene_map to group rows (guides) by gene and summarize by median, mean, or other function. Then, group the columns (replicates) by cell line using replicate_map and summarize by mean or median again.
What about guides that are in raw_readcounts but aren’t in the guide_gene_map?
Beginning in 19Q4, we will include all guides up through the logfold_change matrix, and provide an extended mapping file covering the full Avana library.
What thresholds should I use to decide if a gene is really having a significant effect on a cell line?
Although it depends on the risk of false positives you’re willing to tolerate, for most applications a cutoff of 0.5 in gene dependency probability or greater makes sense. For gene effect, a score less than -0.5 represents noteworthy depletion in most cell lines.
What does a positive CERES score mean?
It indicates that when you knock out the gene, the cell line grows faster. For example, TP53 has a positive score in most p53-wt cell lines. However, considerable caution should be used interpreting positive scores. We’ve found that many outgrowths in CRISPR data appear to be random. For example, in some cases outgrowth occurs for only one guide in one replicate, or occurs for unexpressed genes. Any event that grants a fitness advantage can cause clonal outgrowth and may have nothing to do with the targeted gene.
Why are some cell lines not showing up in the results when certain genes are searched in the combined RNAi dataset?
Differences in the shRNA libraries used to screen different cell lines can lead to differences in the set of gene scores being available. Most notably, cell lines screened using only the Novartis DRIVE libraries will only have gene scores for around half of genes. Additional constraints on the set of shRNAs targeting a given gene can also influence whether gene scores will be available for a given gene. See the DEMETER2 paper for more details.
Where can I find the raw bam/fastQ files for sequencing?
Most of the RNAseq unfiltered bams are available on GDC legacy portal. You should be able to download those and then reconvert to fastq if this is what you are looking for.
Are you able to provide detailed documentation on your bioinformatic pipelines used for computing the copy number ratios, gene expression, and mutation calls?
Can I request cell lines that the Broad has used in DepMap?
DepMap does not distribute cell lines. However, the sample_info file contains sources for most of our cell lines and you can inquire with them about obtaining cell lines..
How can I find information about cell line conditions and media used?
What are the best papers to cite when using DepMap data?
Citation information can be found for each file in the sidebar under the heading “Data Usage”.
Is there a license for the data found in the DepMap portal?
Generally DepMap-generated data are made available under the CC BY 4.0 license. When clicking a file to download, the data license is specified at the bottom of the right panel that appears.