I generated a coordinate sorted vcf file from a cram using the following commands:
samtools sort -@ 10 -o /output/sorted.cram
samtools index -@ 10 /output/sorted.cram
bcftools mpileup -f reference.fa -r chrz:zzzz-zzzzx -a INFO/AD,FORMAT/DP --threads 10 -O v -o /output/mpileup.vcf /input/sorted.cram
I am trying to annotate the coordinate sorted vcf file (ref genome Hg38) with snpsift. I am using the following command:
java -jar SnpSift.jar annotate -v /dbsnp/file.vcf.gz /input/mpileup.vcf > /output/annotated.vcf
I have downloaded the dbsnp vcf file and tab index here: ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/GATK/
However, only 0.52% of the vcf is being annotated... This seems strange. Additionally, when I try to use the ensemble web interface (https://useast.ensembl.org/Multi/Tools/VEP?db=core) to annotate my vcf I get the error "invalid input". This leads me believe something is wrong with my vcf file? I am only trying to annotate one gene, is it normal for only 0.52% of a gene to be annotated by dbsnp? Thank you in advance for any assistance!
Update! If use bcftools mpileup | bcftools call --variants-only then the ensembl tool works. Additionally, this artificially increases the % of SNPs annotated.