Questions tagged [gatk]

Genome Analysis Toolkit - Variant Discovery in High-Throughput Sequencing Data

A genomic analysis toolkit focused on variant discovery.

The GATK is the industry standard for identifying SNPs and indels in germline DNA and RNAseq data. Its scope is now expanding to include somatic short variant calling, and to tackle copy number (CNV) and structural variation (SV). In addition to the variant callers themselves, the GATK also includes many utilities to perform related tasks such as processing and quality control of high-throughput sequencing data, and bundles the popular Picard toolkit.

These tools were primarily designed to process exomes and whole genomes generated with Illumina sequencing technology, but they can be adapted to handle a variety of other technologies and experimental designs. And although it was originally developed for human genetics, the GATK has since evolved to handle genome data from any organism, with any level of ploidy.

18 questions
3
votes
3 answers

Snakemake integrate the multiple command lines in a rule

The output of my first command line "bcftools query -l {input.invcf} | head -n 1" prints the name of the first individual of vcf file (i.e. IND1). I want to use that output in selectvariants GATK in -sn IND1 option. How is it possible to integrate…
user3224522
  • 1,119
  • 8
  • 19
2
votes
2 answers

GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread

I can't seem to get GATK to recognise the number of available threads. I am running GATK (4.2.4.1) in a conda environment which is part of a nextflow (v20.10.0) pipeline I'm writing. For whatever reason, I cannot get GATK to see there is more than…
dthorbur
  • 155
  • 1
  • 11
2
votes
1 answer

snakemake multiple parameters for multiple input and single output in snakemake. ConbineGVCFs gatk problem

I have written a rule for CombineGVCFs in gatk4. The rule is as follow all_gvcf = get_all_gvcf_list() rule cohort: input: all_gvcf_list = all_gvcf, ref="/data/refgenome/hg38.fa", interval_list = prefix+"/bedfiles/hg38.interval_list", …
Shafayet Rahat
  • 234
  • 3
  • 13
1
vote
1 answer

Combine a directory of GVCF files with gatk CombineGVCFs

I've produced a set of about 400 of GVCF files with gatk HaplotypeCaller, with the -ERC GVCF option. I'd now like to combine them for downstream genotyping and variant recalibration. I believe I can combine with gatk CombineGVCFs. gatk CombineGVCFs…
Mike
  • 921
  • 7
  • 26
1
vote
0 answers

What does 'NEGATIVE_TRAIN_SITE' in VQSR mean?

I can't find anywhere what 'NEGATIVE_TRAIN_SITE' means in a VCF data after VQSR. (I have searched everywhere on GATK) I thought it meant that the variant is considered to be not on the truth site due to bad VQSLOD scores, and should be filtered…
JEJI
  • 49
  • 5
0
votes
0 answers

GATK Variant Filtration file Anopheles gambiae s.s

I am trying to find an open source vcf of known variants for Anopheles gambiae s.s, which I can use for filtering with GATK VQSR (GATK Variant Filtration), instead of using hard filtering. Is this available anywhere? I am also looking for this…
smossy
  • 1
  • 1
0
votes
0 answers

New: Fatal error compiling: java.lang.NoClassDefFoundError: com/sun/tools/javac/main/OptionName: com.sun.tools.javac.main.OptionName

I am trying to build GATK 3.4. Downloaded the 3.4 tagged source from github I have JAVA_HOME set to a Java 8 JDK and that the JDK java is first on my PATH 1: Code doesn't compile. …
Greg Dougherty
  • 3,281
  • 8
  • 35
  • 58
0
votes
0 answers

BaseRecalibration by GATK

Does someone know why the size of bam file after using BaseRecalibrator becomes twice larger? I am using GATK_4.1.9. Thanks!
Anna
  • 53
  • 6
0
votes
0 answers

HaplotypeCaller provide variants more than expected

I used HaplotypeCaller for variant calling out of WES picard.sorted.MarkedDup.bam file with GATK 4.2.6.1. HaplotypeCaller standard command line. Apparently, everything worked well and I received standard .vcf file. But the number of identified…
Alireza
  • 3
  • 2
0
votes
0 answers

How do I use this python function within the params section of my Snakemake rule?

I'm trying to figure out how to extract the read-group lane information from a fastq file, and then use this string within my GATK AddOrReplaceReadGroups Snakemake below (below). I've written a short Python function (at the top of the rule) to do…
0
votes
1 answer

Snakemake first genotype of a vcf file as wildcard in output

In the second rule I would like to select from the vcf file containing bob, clara and tim, only the first genotype of dictionary (i.e. bob) in roder to get as output in the second rule bob.dn.vcf. Is this possible in snakemake? d = {"FAM1":…
user3224522
  • 1,119
  • 8
  • 19
0
votes
1 answer

How to run ensembl-vep in conda

I’ve installed like so: conda install ensembl-vep=105.0-0 And then installed the human cache like this: vep_install -a cf -s homo_sapiens -y GRCh38 -c /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/refs/vep —CONVERT But I can’t get it to run…
Mike
  • 921
  • 7
  • 26
0
votes
1 answer

Error running gatk HaplotypeCaller with allele specific annotations

I've got HaplotypeCaller working nicely in standard mode, like so: # Run haplotypcaller gatk --java-options "-Xmx4g" HaplotypeCaller \ --intervals "$INTERVALS" \ -R "$REF" \ -I…
Mike
  • 921
  • 7
  • 26
0
votes
1 answer

Snakemake: create multiple wildcards for the same argument

I am trying to run a GenotypeGVCFs on many vcf files. The command line wants every single vcf files be listed as: java-jar GenomeAnalysisTK.jar -T GenotypeGVCFs \ -R my.fasta \ -V bob.vcf \ -V smith.vcf \ -V kelly.vcf \ -o {output.out} How to do…
user3224522
  • 1,119
  • 8
  • 19
0
votes
1 answer

gatk VariantRecalibrator positional argument error

I'm trying to use recalibrate my vcf using gatk VariantRecalibrator, but keep getting an error "Illegal argument value: Positional arguments were provided". But I don't know what this means, or how to correct it! Here's my call: gatk…
Mike
  • 921
  • 7
  • 26
1
2