1

I generated a coordinate sorted vcf file from a cram using the following commands:

samtools sort -@ 10 -o /output/sorted.cram

samtools index -@ 10 /output/sorted.cram

bcftools mpileup -f reference.fa -r chrz:zzzz-zzzzx -a INFO/AD,FORMAT/DP --threads 10 -O v -o /output/mpileup.vcf /input/sorted.cram

I am trying to annotate the coordinate sorted vcf file (ref genome Hg38) with snpsift. I am using the following command:

java -jar SnpSift.jar annotate -v /dbsnp/file.vcf.gz /input/mpileup.vcf > /output/annotated.vcf

I have downloaded the dbsnp vcf file and tab index here: ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/GATK/

However, only 0.52% of the vcf is being annotated... This seems strange. Additionally, when I try to use the ensemble web interface (https://useast.ensembl.org/Multi/Tools/VEP?db=core) to annotate my vcf I get the error "invalid input". This leads me believe something is wrong with my vcf file? I am only trying to annotate one gene, is it normal for only 0.52% of a gene to be annotated by dbsnp? Thank you in advance for any assistance!


Update! If use bcftools mpileup | bcftools call --variants-only then the ensembl tool works. Additionally, this artificially increases the % of SNPs annotated.

Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
  • 1
    I do not see any errors on your image. It says: `Errors (bad references) : 0` – Usagi Miyamoto Apr 28 '20 at 07:10
  • @UsagiMiyamoto Thank you for your reply! I thought "Errors (bad references)" was an error message, but it is not. I am still concerned that only 0.52% of my vcf is being annotated by dbsnp. I have clarified the post above. I welcome further comments! – Code_Aelita Apr 28 '20 at 13:31

0 Answers0