Frequently Asked Questions


Genetic Perturbation

How is the probability of dependency different from the gene effect score?
The “gene effect” file contains the corrected CERES scores, which measure the effect size of knocking out a gene, normalized against the distributions of non-essential and pan-essential genes. The probabilities assess, given a gene score, how likely to be a member of the non-essential distribution or the common essential distribution in that cell line. The key difference between using a fixed threshold on CERES score and a threshold on the probabilities is that the probabilities take into account the screening quality, which varies from line to line.

So which one should I use?

Depending on the question you want to ask, you may want to choose to use one measure or the other. For cases where you are interested in potentially subtle variation in the strength of killing, such as computing co-dependency correlations, using the CERES scores makes sense. However, if you are only interested in binary relationships of which lines are killed or not, for example, when looking for biomarkers which classify lines into sensitive or insensitive, then the dependency probabilities may make more sense to use.

I’m a computationalist and I want gene scores with no copy number corrections or other fancy processing. How can I get them?
Starting with the matrix logfold_change, you can use guide_gene_map to group rows (guides) by gene and summarize by median, mean, or other function. Then, group the columns (replicates) by cell line using replicate_map and summarize by mean or median again.

What about guides that are in raw_readcounts but aren’t in the guide_gene_map?
Beginning in 19Q4, we will include all guides up through the logfold_change matrix, and provide an extended mapping file covering the full Avana library.

What thresholds should I use to decide if a gene is really having a significant effect on a cell line?
Although it depends on the risk of false positives you’re willing to tolerate, for most applications a cutoff of 0.5 in gene dependency probability or greater makes sense. For gene effect, a score less than -0.5 represents noteworthy depletion in most cell lines.

What does a positive CERES score mean?
It indicates that when you knock out the gene, the cell line grows faster. For example, TP53 has a positive score in most p53-wt cell lines. However, considerable caution should be used interpreting positive scores. We’ve found that many outgrowths in CRISPR data appear to be random. For example, in some cases outgrowth occurs for only one guide in one replicate, or occurs for unexpressed genes. Any event that grants a fitness advantage can cause clonal outgrowth and may have nothing to do with the targeted gene.

Why are some cell lines not showing up in the results when certain genes are searched in the combined RNAi dataset?
Differences in the shRNA libraries used to screen different cell lines can lead to differences in the set of gene scores being available. Most notably, cell lines screened using only the Novartis DRIVE libraries will only have gene scores for around half of genes. Additional constraints on the set of shRNAs targeting a given gene can also influence whether gene scores will be available for a given gene. See the DEMETER2 paper for more details.

Omics

Where can I find the raw bam/fastQ files for sequencing?
Most of the RNAseq unfiltered bams are available on GDC legacy portal. You should be able to download those and then reconvert to fastq if this is what you are looking for.

Where can I find detailed documentation on your bioinformatic pipelines used for computing the copy number ratios, gene expression, and mutation calls?
You can find more details on the methods, including pipelines, in Supplementary Information section of the CCLE papers. Older files are described in the original CCLE manuscript and newer files are described in the new CCLE manuscript. Additional information can also be found in the README.txt file for each DepMap Release.

What is relative copy number/copy number ratio?
Since we do not have matched normals, the output is a “copy ratio” or relative copy number. It is relative to the rest of the genome for that cell line. E.g. if the cell line is tetraploid we would not be able to see it from the relative copy number.

What is the Variant_annotation column in CCLE_mutations.csv, and how are mutations shown in the portal?

The Variant_annotation column in the CCLE_mutations.csv MAF file

We have added a Variant_annotation column in the DepMap mutation data, CCLE_mutations.csv, which groups mutations using more inclusive definitions. The Variant_annotation column labels a mutation as "damaging", "other non-conserving", "other conserving" or "silent" using the Variant_Classification column and the definitions below. "Hotspot" is not a label in Variant_annotation, but can be obtained from the isTCGAhotspot and isCOSMIChotspot columns.

Coloring mutations in portal visualizations

The portal colors mutations with the priority order of hotspot > damaging > other non-conserving > other conserving. For instance, if a gene in a cell line has both hotspot and damaging mutations, it will be colored as hotspot. Silent mutations are not colored. These mutation categories are defined below.

The mutation dataset is also available to plot as 0 or 1 on an axis in data explorer. Note that this uses a binarized defintion of mutation, which includes any "hotspot", "damaging", or "other non-conserving" mutation.

Hotspot

  • Is a hotspot in TCGA
  • Is a hotspot in COSMIC
  • Is not silent

Damaging

  • Start_Codon_SNP
  • Start_Codon_Del
  • Start_Codon_Ins
  • Splice_Site
  • Frame_Shift_Del
  • Frame_Shift_Ins
  • Nonsense_Mutation
  • De_novo_Start_OutOfFrame

Other non-conserving

  • Missense_Mutation
  • In_Frame_Del
  • In_Frame_Ins
  • Nonstop_Mutation
  • Stop_Codon_Del
  • Stop_Codon_Ins

Other conserving

  • 5'Flank
  • Intron
  • IGR
  • 3'UTR
  • 5'UTR

Silent

  • Silent

Which gene alignment is being used for each data file?
18Q4 19Q1 19Q2 19Q3 19Q4 20Q1
CRISPR hg19 hg19 hg19 hg38 hg38 hg38
Copy Number hg19 hg19 hg19 hg38 lift hg38 hg38
Expression hg19 hg19 hg19 hg38 hg38 hg38
Fusions hg19 hg19 hg19 hg38 hg38 hg38
Mutations hg19 hg19 hg19 hg19 hg19 hg19

Cell lines

Can I request cell lines that the Broad has used in DepMap?
DepMap does not distribute cell lines. However, the sample_info file contains sources for most of our cell lines and you can inquire with them about obtaining cell lines.

May I submit cell lines for screening?
Yes, please see our Call for Cell Line Models page for more information.

How can I find information about cell line conditions and media used?
This information can be found in the sample_info file

Citations/Licenses

What are the best papers to cite when using DepMap data?
Citation information can be found for each file in the sidebar under the heading “Data Usage”.

Is there a license for the data found in the DepMap portal?
Generally DepMap-generated data are made available under the CC BY 4.0 license. When clicking a file to download, the data license is specified at the bottom of the right panel that appears.



Comments icon More questions?
Let us know other questions you might have about the portal.