0

I have a list of SNPs for example (let's call it file1):

SNP_ID      chr position   
rs9999847   4   182120631
rs999985    11  107192257
rs9999853   4   148436871
rs999986    14  95803856
rs9999883   4   870669
rs9999929   4   73470754
rs9999931   4   31676985
rs9999944   4   148376995
rs999995    10  78735498
rs9999963   4   84072737
rs9999966   4   5927355
rs9999979   4   135733891

I have another list of SNP with corresponding P-value (P) and BETA (as shown below) for different phenotypes here i have shown only one (let's call it file2):

CHR SNP         BP      A1 TEST NMISS   BETA    SE      L95     U95     STAT    P
1   rs3094315   742429  G   ADD 1123    0.1783  0.2441  -0.3    0.6566  0.7306  0.4652
1   rs12562034  758311  A   ADD 1119    -0.2096 0.2128  -0.6267 0.2075  -0.9848 0.3249
1   rs4475691   836671  A   ADD 1111    -0.006033 0.2314 -0.4595 0.4474 -0.02608 0.9792
1   rs9999847   878522  A   ADD 1109    -0.2784 0.4048  -1.072  0.5149  -0.6879 0.4916
1   rs999985    890368  C   ADD 1111    0.179   0.2166  -0.2455 0.6034  0.8265  0.4087
1   rs9999853   908247  C   ADD 1110    -0.02015 0.2073 -0.4265 0.3862  -0.09718 0.9226
1   rs999986    918699  G   ADD 1111    -1.248  0.7892  -2.795  0.2984  -1.582  0.114

Now I want to make two files named file3 and file4 such that:

file3 should contain:

SNPID    Pvalue_for_phenotype1   Pvalue_for_phenotype2   Pvalue_for_phenotype3 and so on....
rs9999847 0.9263                 0.00005                 0.002                ..............

The first column (SNPIDs) in file3 will be fixed (all the snps in my chip will be listed here), and i want to write a programe so that it will match snp id in file3 and file2 and will fetch the P-value for that corresponding snp id and put it in file3 from file2.

file4 should contain:

SNPID    BETAvale_for_phenotype1     BETAvale_for_phenotype2     BETAvale_for_phenotype3 .........
rs9999847 0.01812                       -0.011                            0.22

the 1st column (SNPIDs) in file4 will be fixed (all the SNPs in my chip will be listed here), and I want to write a program so that it will match SNP ID in file4 and file2 and will fetch the BETA for that corresponding SNP ID and put it in file4 from file2.

ghoti
  • 45,319
  • 8
  • 65
  • 104
Ismeet Kaur
  • 63
  • 2
  • 8
  • 2
    I'm sorry, I don't clearly understand your problem. Could you please clarify it? I understand file1 and file2 are given and file3 and file4 are to be generated. If so, where does for example the value 0.9263 in file3 come from? I can't find it anywhere in file1 or file2. – Balint Jun 14 '12 at 16:15
  • Also, what is your question?? – Graham Jun 14 '12 at 18:17
  • Hi! lets say the question this way: file3 is a list of some ids called "SNP" id here, now from file2 i want to match the "SNP" list in file3 and print the corresponding "P" value from file2 to file3 (in front of matching IDs). Also same exercise for file4 using "BETA" value from file2 – Ismeet Kaur Jul 10 '12 at 11:20

1 Answers1

0

it's a simple exercise about How to transfer the data of columns to rows (with awk)?

file2 to file3.

I assumed that you have got machine with large RAM, because I think that you have got million lines into file2.

you could save this code into column2row.awk file:

#!/usr/bin/awk -f                                                        

BEGIN {
    snp=2
    val=12
}

{
    if ( vector[$snp] )
        vector[$snp] = vector[$snp]","$val
    else
        vector[$snp] = $val
}

END {
    for (snp in vector)
        print snp","vector[snp]
}

where snp is column 2 and val is column 12 (pvalue). now you could run script:

/usr/bin/awk -f column2row.awk file2 > file3

If you have got small RAM, then you could divide load:

cat file1 | while read l; do s=$(echo $l|awk '{print $1}'); grep -w $s file2 > $s.snp; /usr/bin/awk -f column2row.awk $s.snp >> file3; done

It recovers from $l (line) first parameter ($s, snp name), search $s into file2 and create small file about each snp name. and then it uses awk script to generate file3.

file2 to file4.

you could modify value about val into awk script from column 12 to 7.

Community
  • 1
  • 1