1

I've produced a set of about 400 of GVCF files with gatk HaplotypeCaller, with the -ERC GVCF option. I'd now like to combine them for downstream genotyping and variant recalibration. I believe I can combine with gatk CombineGVCFs.

gatk CombineGVCFs \
   -R reference.fasta \
   --variant sample1.g.vcf.gz \
   --variant sample2.g.vcf.gz \
   -O cohort.g.vcf.gz

But what I don't know, is how to input all my 400 GVCF files into CombineGVCFs. I've heard this can be done with the --arguments_file option, but I don't know how to build such a file?

Any help gratefully received!

Mike
  • 921
  • 7
  • 26

1 Answers1

1

First, you need to create a text file containing the all GVCFs you want to combine:

ls gvcfs/*.vcf >gvcfs.list

Then use CombineGVCFs:

gatk --java-options "-Xmx180G -XX:ParallelGCThreads=36" CombineGVCFs -R $ref --variant gvcfs.list --dbsnp $DBSNP -O combined_gvcf.vcf
ZygD
  • 22,092
  • 39
  • 79
  • 102
Vincent
  • 11
  • 2