Questions tagged [vcftools]

VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.

VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.

This toolset can be used to perform the following operations on VCF files:

  • Filter out specific variants
  • Compare files
  • Summarize variants
  • Convert to different file types
  • Validate and merge files
  • Create intersections and subsets of variants

Links:

  1. Home page

  2. Documentation

  3. Github

42 questions
6
votes
2 answers

How to convert vcf file to ped file using plink?

I am trying to convert a .vcf file to a .ped file using plink. I have read some manuals and posts online, but it seems that no one specifically mentions how to convert vcf to ped. I am hoping that there may be some experts here who have experience…
NeverBe
  • 107
  • 1
  • 1
  • 7
5
votes
1 answer

Bash script for pairwise comparisons

I would like to write a bash script to do a pairwise calculation with my files. I have a fixed file in a directory and a series of files that I want to use them for pairwise comparisons. For example: The name of the fixed file is: Genome.vcf The…
Homap
  • 2,142
  • 5
  • 24
  • 34
4
votes
0 answers

VCF4.2 file not recognised by GATK

Ive seen a lot having the same problem, but I havnt found a solution yet. I have supplied 24 VCF4.1 files (http://evs.gs.washington.edu/EVS/) to GATKs CombineVariants. I get this error: ##### ERROR MESSAGE: Invalid command line: No tribble type was…
2
votes
2 answers

How can I filter by read depth using vcftools?

I am trying to build a workflow to analyse my scRNA-seq data. I am using a combination of GATK and samtools, vcftools, bcftools. I would like to filter my .vcf file such that it removes all entries that have fewer than 10 reads. It looks like…
Cora_olpe
  • 43
  • 7
2
votes
1 answer

loop within a loop vcftools bash

I am trying to utilise the vcftools package to calculate weir and cockerham's fst. I would like to loop over two pairs of populations in the first instance and then loop these populations across all variants from the 1000 Genomes project: each…
DeanR
  • 21
  • 1
2
votes
1 answer

Snakemake: unknown output/input files after splitting by chromosome

To speed up a certain snakemake step I would like to: split my bamfile per chromosome using bamtools split -in sample.bam --reference this results in files named as sample.REF_{chromosome}.bam perform variant calling on each resulting in e.g.…
2
votes
1 answer

Perl script cannot access Tabix folder

I'm running a Perl script from the EMBL (found here https://github.com/EMBL-EBI-GCA/reseqtrack/blob/master/scripts/variation_data/calculate_allele_frq_from_vcf.pl) Under Ubuntu 16.10 I have installed Vcftools and Tabix as is required and both have…
Svencken
  • 479
  • 6
  • 14
2
votes
1 answer

Extrapolating variance components from Weir-Fst on Vcftools

vcftools --vcf ALL.chr1.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf --weir-fst-pop POP1.txt --weir-fst-pop POP2.txt --out fst.POP1.POP2 The above script computes Fst distances on 1000 Genomes population data using Weir and…
2
votes
1 answer

Choose higher values from two columns after extracting the number, R

I have a data frame (451 obs of 8 variables) that has two columns (6&7) that look like this: Major Minor C:726 T:2 A:687 G:41 T:3 C:725 I want to create one column that summarises this. To do this, I don't care about…
cianius
  • 2,272
  • 6
  • 28
  • 41
2
votes
6 answers

Linux Makefile: undefined reference to 'gzbuffer' (where LIB = -lz)

I'm trying to install a program (vcftools), for which the Makefile reads as follows: # Make file for vcftools # Author: Adam Auton # ($Revision: 230 $) # Compiler CPP = g++ # Output executable EXECUTABLE = vcftools # Flag used to turn on…
jme6f4
  • 43
  • 2
  • 6
2
votes
2 answers

Bash troubleshooting: Not a valid identifier

Beginner here trying to get a pipeline working in bash. If somebody can see why when I run the following I get: -bash: `$i': not a valid identifier, that would be really helpful. Also if there are other mistakes please let me know for $i in…
user964689
  • 812
  • 7
  • 20
  • 40
1
vote
0 answers

How to configure vcftools in Ubuntu during installation?

I'm trying to install VCFtools (https://github.com/vcftools/vcftools) in my Ubuntu and getting error when trying to configure it. The readme file recommends: cd vcftools ./autogen.sh ./configure make make install To begin, I dont have a configure…
1
vote
1 answer

Problems getting two output files in Nextflow

Hello all! I´m trying to write a small Nextflow pipeline that runs vcftools comands in 300 vcf´s. The pipe takes four inputs: vcf, pop1, pop2 and a .txt file, and would have to generate two outputs: a .log.weir.fst and a .log.log file. When i run…
1
vote
1 answer

match 1,2,5 columns of file1 with 1,2,3 columns of file2 respectively and output should have matched rows from file 2. second file is zipped file .gz

file1 3 1234581 A C rs123456 file2 zipped file .gz 1 1256781 rs987656 T C 3 1234581 rs123456 A C 22 1792471 rs928376 G T output 3 1234581 rs123456 A C I tried zcat file2.gz | awk 'NR==FNR{a[$1,$2,$5]++;next}…
rij
  • 159
  • 7
1
vote
1 answer

How to extract genotype information for each sample as a string from a VCF file using htslib?

I am using htslib for extracting all the information contained in a VCF file in C++. Currently, thanks to the VCF specification and the documentation in the file vcf.h, I have successfully extracted all the metadata information in the header…
1
2 3