1

I work with genetic data. I just found a supercomputer to help with genetic analysis, but I need to convert the data to exactly the format the super computer wants: two columns, one with chromosome information and one with p-value. The p-value column must not have any letters, but some of the data I have is in scientific notation, like so:

rs191895619 1.052e-05
rs140779862 0.4406
rs11127542 0.9771
rs112183333 0.02569
rs191067167 0.427
rs111321342 1.042e-05

which puts several E's in the column that must not have letters in it.

I tried to use grep to move them into their own file using grep "*e*" filename.txt > outputfilename.txt as well as grep "*e-05" filename.txt > outputfilename.txt but it gave me a blank output file both times, and even if all 5000 lines of scientifically notated data had moved into their own file, I don't know how to change the data to decimal notation except by editing each line individually, which would take several days for each file.

Is there a command I can give to plink so that the data it gives me is not in scientific notation in the first place? Or a command I can use in plink or Unix to convert the scientific notation I have into decimal notation?

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
Arcadia
  • 13
  • 2

1 Answers1

0

You can use awk to convert scientific to decimal:

awk '{printf "%s %f\n", $1, $2}' file

Outputs:

rs191895619 0.000011
rs140779862 0.440600
rs11127542 0.977100
rs112183333 0.025690
rs191067167 0.427000
rs111321342 0.000010

You can adjust the precision by changing %f part in printf.


See also:

Community
  • 1
  • 1
codeforester
  • 39,467
  • 16
  • 112
  • 140