0

Data:

CHROM 1                         2                 3 

ABC   0/1:PASS:123:4214 0/1:PASS:546:4656 0/0:PASS:444:865 
DEF   0/1:PASS:123:4214 0/1:PASS:546:4656 0/0:PASS:444:865
GFD   0/1:PASS:123:4214 0/1:PASS:546:4656 0/0:PASS:444:865

Desired output:

CHROM 1   2   3 

ABC   0/1 0/1 0/0 
DEF   0/1 0/1 0/0
GFD   0/1 0/1 0/0

I have tried:

sub(':*', ''. data) 
gsub(':*', ''. data) 

But this give me a list as an output rather than a table and I'm not sure what I am doing wrong

Your help would be much appreciated

Many thanks

zx8754
  • 52,746
  • 12
  • 114
  • 209
tacrolimus
  • 500
  • 2
  • 12
  • 2
    `df[] <- lapply(df, function(x) sub(":.*", "", as.character(x)))` – Wiktor Stribiżew Sep 28 '21 at 10:52
  • thanks that worked after I converted the output into a data frame (was a list initially) – tacrolimus Sep 28 '21 at 11:02
  • If it is a VCF file, then use bcftools to extract GT field. – zx8754 Sep 28 '21 at 11:14
  • I have! I have tranposed it with datamash in linux to get the sample IDs as a column. Does bcftools allow you to get a clean output with samples on the left and then GT in subsequent columns with each column being a variant? @zx8754 – tacrolimus Sep 28 '21 at 11:18
  • Yes, use the right tool for the job, see example my answer for extracting certain columns from VCF: https://stackoverflow.com/a/59104151/680068 In your case you would need `%GT` – zx8754 Sep 28 '21 at 11:32

0 Answers0