0

This is the issue: I gave as an input to AWK a comma-delimited table (and speficying FS=","), take the average of the 2-3rd column, the same for 4-5th column and print the first column value \t average1 \t average2 \n BUT the first column have genenames, and some of them looks like dates AND when I print, these names change, for example "Sept15" changed to "15-Sep", and I want to avoid this

awk 'BEGIN{FS=",";OFS="\t"}{if(NR==1){next}{print $1,($2+$3)/2,($4+$5)/2}}' DESeqResults.csv | grep Sep

Even when using printf(%s)

awk 'BEGIN{FS=",";OFS="\t"}{if(NR==1){next}{printf("%s\t%d\t%d\n",$1,($2+$3)/2,($4+$5)/2) }}' DESeqResults.csv | grep Sep

I thought that using printf instead of just print could work, but it didn't. And I'm pretty sure something is going on when reading the value and preprocessing it, because I printed it out only that column and anything else (using both print and printf) and the value is already changed.

awk 'BEGIN{FS=",";OFS="\t"}{if(NR==1){next}{print $1}}' DESeqResults.csv | grep Sep

AWK Version: GNU Awk 3.1.7 Copyright (C) 1989, 1991-2009 Free Software Foundation.

Here is a sample of the table:

genes,Tet2/3_Ctrtl_A,Tet2/3_Ctrtl_B,Tet2/3_DKO_A,Tet2/3_DKO_B,baseMean,baseMean_Tet2/3_Ctrtl,baseMean_Tet2/3_DKO,foldChange(Tet2/3_DKO/Tet2/3_Ctrtl),log2FoldChange,pval,padj
Sep15,187.0874494,213.5411848,289.6434172,338.0423229,1376.196203,926.4220733,1825.970332,1.970991824,0.978921792,5.88E-05,0.003018514
Psmb2,399.4650982,355.9642309,557.3871013,632.1236546,1462.399465,983.7201408,1941.078789,1.973202244,0.980538833,6.00E-05,0.003071175
Sept1,144.2402924,114.9623101,52.39183843,18.11079498,386.2494712,579.8722584,192.6266841,0.332188135,-1.58992755,0.000418681,0.014756367
Psmd8,101.3085151,68.51270408,140.650979,154.2588735,627.727588,396.4360624,859.0191136,2.166854116,1.115602027,0.000421417,0.014825295
Sepw1,388.2193716,337.7605508,209.8232326,155.9087497,639.6596557,787.1262578,492.1930536,0.625303817,-0.677370771,0.004039946,0.080871288
Cks1b,265.8259249,287.954538,337.1108392,408.0547432,865.5821999,642.8510296,1088.31337,1.692948008,0.759537668,0.004049464,0.0809765
Sept2,358.4252141,302.9219723,393.3509343,394.2208442,4218.71214,3392.272118,5045.152161,1.48724866,0.572645878,0.004380269,0.085547008
Tuba1a,19.47153869,11.1692256,40.09945086,28.7539846,142.1610148,75.37000403,208.9520256,2.772349933,1.47110937,0.004381599,0.085547008
Sepx1,14.5941944,15.37680483,53.70015607,105.5523799,157.8475412,40.73526884,274.9598136,6.749920191,2.754870444,0.010199249,0.153896056
Apc,10.90608004,13.56070852,6.445046152,4.536589807,363.4471652,466.2312058,260.6631245,0.559085538,-0.838859068,0.010251083,0.154555416
Sephs2,38.20092337,29.90249614,41.38713976,60.29027195,328.8398211,228.5362706,429.1433717,1.877791086,0.909036565,0.088470061,0.590328676
2310008H04Rik,12.72162335,13.98659226,17.77340283,16.88409867,175.2157133,133.5326829,216.8987437,1.624312033,0.699828804,0.088572283,0.590803249
Sepn1,16.26472482,11.00430796,7.219301889,7.109776037,119.8773488,144.9435253,94.81117235,0.654124923,-0.612361911,0.129473781,0.719395557
Fancc,6.590254663,5.520421849,8.969058939,8.394987722,111.479883,79.97866541,142.9811007,1.787740518,0.838137351,0.129516654,0.719423355
Sept7,170.6589676,187.3808346,185.8091089,158.0134115,1444.411676,1313.631233,1575.192119,1.199112871,0.261967464,0.189661613,0.852792911
Obsl1,1.400612677,0.51329399,0.299847728,0.105245908,10.77805777,17.15978377,4.396331776,0.256199719,-1.964659203,0.189677412,0.852792911
Sepp1,136.2725767,142.7392758,137.5079558,135.5576156,1055.39992,948.5532274,1162.246613,1.225283494,0.293115585,0.193768055,0.862790863
Tom1l2,6.079259794,5.972711213,4.188234003,1.879086398,93.62018078,115.620636,71.61972551,0.619437221,-0.690970019,0.193795263,0.862790863
Sept10,5.07506603,4.240574236,7.415271602,7.245735277,56.38191446,38.04292126,74.72090766,1.964121187,0.973883947,0.202050794,0.874641256
Jag2,0.531592511,1.753353521,0.106692242,0.099863326,7.812876603,14.01922398,1.606529221,0.114594732,-3.125387366,0.202074037,0.874641256
Sept9,25.71885843,9.170659969,29.98187141,23.5519093,333.6707351,231.1780024,436.1634678,1.8866997,0.915864812,0.227916377,0.920255208
Mad2l2,22.00853798,17.42180189,30.74357865,21.99530555,98.71951578,74.31522721,123.1238044,1.656777608,0.72837996,0.227920237,0.920255208
Sept8,3.128945597,4.413675869,1.658838722,1.197769008,38.73123291,52.59586062,24.8666052,0.472786354,-1.080739698,0.237101573,0.929055595
BC018465,1.974718423,2.171073663,0.264221349,0.123654833,5.802858162,10.40514412,1.200572199,0.115382563,-3.115502877,0.237135522,0.929055595
Sept11,51.69299305,57.36531814,51.69117677,51.61623861,915.6234052,837.2625097,993.9843007,1.187183576,0.247543039,0.259718041,0.949870478
Ccnc,11.42168015,13.32308428,14.76060133,12.19352385,173.0536821,146.6301746,199.4771895,1.36041023,0.444041759,0.259794956,0.949870478
Sept12,0,5.10639021,0,0.158638685,5.07217061,9.738384198,0.405957022,0.041686281,-4.584283515,0.388933297,1
Gclc,24.79641294,20.9904856,13.36470176,15.92090715,146.8502169,163.0012707,130.6991632,0.801829106,-0.318633307,0.3890016,1
Sept14,0.15949349,1.753526538,0,0,2.425489894,4.850979788,0,0,#NAME?,0.396160673,1
Slc17a1,0.131471208,1.445439884,0,0,2.425489894,4.850979788,0,0,#NAME?,0.396160673,1
Sept6,34.11050622,30.16102302,28.2562382,14.56889172,602.5658704,661.8163161,543.3154247,0.820945951,-0.284640854,0.416246976,1
Unc119,6.098478253,9.710512531,4.558282355,1.738214353,23.04654843,30.90026472,15.19283214,0.491673203,-1.024228366,0.416259755,1
Sept4,2.305246374,2.534467513,1.18972284,0.618652085,8.87244411,12.13933481,5.605553408,0.461767757,-1.114760654,0.560252893,1
Ddb2,11.25366078,17.32172888,10.50269513,6.025122118,71.81085298,83.53254996,60.089156,0.719350194,-0.475233821,0.560482212,1
Sephs1,20.92060935,15.48240612,15.94132159,11.57137656,288.7538099,298.3521103,279.1555094,0.935657902,-0.095946952,0.568672243,1
BC021785,0.135120133,0.891334456,0.108476095,0.101533002,5.825443635,9.241093439,2.409793832,0.260769339,-1.939153843,0.568713405,1
Sepsecs,7.276880132,6.154194955,5.055549522,3.680417498,35.9322246,39.77711194,32.08733726,0.806678406,-0.309934458,0.673968316,1
Osbpl7,10.51628336,5.69720028,7.157857243,5.382675661,86.65916873,88.67338952,84.64494794,0.954569893,-0.06707726,0.674000752,1
Sept3,0.113880577,0.250408482,0.228561799,0.042786507,2.505996654,2.619498342,2.392494966,0.913340897,-0.13077466,1,1
Sept5,0.126649979,0,0.203352303,0,0.609528347,0.424441516,0.794615178,1.872142914,0.904690571,1,1
Serpina11,0,0,0.14524189,0,0.198653794,0,0.397307589,Inf,Inf,1,1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Edahi
  • 59
  • 7
  • `for example "Sep15" changed to "15-Sep",`: That does not happen for me for any of the lines in your sample input. Please try running it again. If the problem occurs again, show exactly which line it occurs on. – John1024 Mar 03 '15 at 00:03
  • It occurs in any date-looking name. Sept15 for example (note Sep15 corrected to Sept15). By the way, I also modified the format of the Sample input lines, if you copied directly in terminal as the previous format, they are readed as a one-liner (at least for macs). thanks! – Edahi Mar 03 '15 at 00:47
  • 1
    pretty sure awk doesn't do any automagical parsing of dates, Excel it is not – qwwqwwq Mar 03 '15 at 00:55
  • No; `awk` does not do that by default, even on a Mac. At least, not with the standard `awk` on Mac OS X 10.10.2 Yosemite, which I tested with, nor would I expect it to do so with any other variant of `awk` I've ever seen on a Mac. – Jonathan Leffler Mar 03 '15 at 00:56
  • What is the best way to add the dataset to this question? so I don't do it again. Thank you! – Edahi Mar 03 '15 at 01:09
  • @Edahi To summarize the above comments, `awk` does not do what you suspect it of doing. Any date reformatting is coming from somewhere else. Try running it again and carefully match up input lines and output lines. – John1024 Mar 03 '15 at 01:09
  • @Edahi: the best way to handle the input is to have the data with one line per line of data, and paste that into the question. Then select all the data and press the **`{}`** button above the edit box to indent it all by four spaces, which converts it to 'code'. Your lines are a bit long, but that's preferable to a single line. – Jonathan Leffler Mar 03 '15 at 01:12
  • 1
    Somewhat to my surprise, I have `gawk` installed and it is `GNU Awk 3.1.7 Copyright (C) 1989, 1991-2009 Free Software Foundation`. (I knew I had `gawk` installed, but I was expecting it to be a 4.x version.) Given your data on my Mac, the output of your first `awk` (`gawk`) command on the data you gave does no mapping whatsoever on the first column. If you subsequently import the data into a spreadsheet, the spreadsheet could do all sorts of transformations, but that isn't `awk`'s fault. – Jonathan Leffler Mar 03 '15 at 01:16
  • I am amazed! How is possible that pasting the sample input I provided and using exactly the same awk command I am getting what I want, but using the original *csv file, doesn't work!?!? That is just amazing... for sure something about this excel saving files things... – Edahi Mar 03 '15 at 01:34
  • 1
    You mentioned Mac in one of your early comments; are you using Mac OS X? If so, which version, and if not, what are you using? If you want to send me your CSV file (send it in a format such as a zip file or a gzipped tar file, to ensure that it doesn't get corrupted), then I'll look and see if I can make head or tail of what you've got and what's going on — see my profile. But I am not expecting to find any problem outside the spreadsheet. If the data is imported to a spreadsheet, then I won't be surprised to find the 'date-like' values in column 1 are reformatted. _[…continued…]_ – Jonathan Leffler Mar 03 '15 at 03:38
  • _[…continuation…]_ I tried importing the CSV from the data in the question into LibreOffice (4.4.1.2, or 4.4.1002, depending on where you look for the version number), and no transformation occurred on the data in column 1. I'm busy updating Numbers and Excel to see what happens with them — I've been lax, it seems, about updating them recently. – Jonathan Leffler Mar 03 '15 at 03:43
  • 1
    So, the results are in. On my Mac runing OS X 10.10.2 Yosemite, LibreOffice 4.4.1.2 and Numbers 3.5.2 both leave the keys starting 'Sep' alone, but MS Excel (for Mac 2011, version 14.4.8 — 150116) translates them to a date (so `Sep15` becomes `15-Sep`, for example). Unfortunately, even embedding the column in double quotes does not help. I don't have a good solution other than "do not use MS Excel". There probably is a way to suppress the behaviour, but you need to ask a question tagged [tag:excel] and [tag:csv] rather than [tag:awk] [tag:printing] [tag:printf]. – Jonathan Leffler Mar 03 '15 at 15:49
  • @JonathanLeffler thank you very much for the follow up. Yes, I am using Mac OS X Yosemite. I have all the info in a spreadsheet and I saved only the sheet of interest as a *csv file, then imported this to one of the nodes where I usually work and check if the file was fine. i.e. grepping for date-like genenames, and they looked identical in terminal as in the sheet, but after using AWK, they don't, that's the reason why I though that AWK was responsible of this. I re-saved the sheet, but in tab-delimited and it worked... How and why in the world happened this... I don't know.. by the way: – Edahi Mar 04 '15 at 04:57
  • @JonathanLeffler how can I give all the possible stars by all of your following up?? You went all the way through the problem and find a solution (or at least what caused the problem). How can I give you all the possible stars? Thanks ---- Well, given that I though the problem was caused by awk (as I explained above) I marked this as AWK-related problem, If I knew this was something related to excel I wouldn't be even here. Thank you again! – Edahi Mar 04 '15 at 04:59

1 Answers1

1

Transferring extensive comments into an answer

No; awk does not convert strings such as Sep15 to a date 15-Sep by default, even on a Mac. At least, not with the standard awk on Mac OS X 10.10.2 Yosemite, which I tested with, nor would I expect it to do so with any other variant of awk I've ever seen on a Mac.

[…time passed…] Somewhat to my surprise, I have gawk installed and it is GNU Awk 3.1.7 Copyright (C) 1989, 1991-2009 Free Software Foundation. (I knew I had gawk installed, but I was expecting it to be a 4.x version.) Given your data on my Mac, the output of your first awk (gawk) command on the data you gave does no mapping whatsoever on the first column. If you subsequently import the data into a spreadsheet, the spreadsheet could do all sorts of transformations, but that isn't awk's fault.

You mentioned Mac in one of your early comments; are you using Mac OS X? I am not expecting to find any problem outside the spreadsheet. If the data is imported to a spreadsheet, then I won't be surprised to find the 'date-like' values in column 1 are reformatted.

I tried importing the CSV from the data in the question into LibreOffice (4.4.1.2, or 4.4.1002, depending on where you look for the version number), and no transformation occurred on the data in column 1. Similarly, Numbers 3.5.2 and OpenOffice 4.1.1 both leave the keys starting 'Sep' alone.

Unfortunately, MS Excel (for Mac 2011, version 14.4.8 — 150116) translates such column values to a date (so Sep15 becomes 15-Sep, for example). Even embedding the column in double quotes does not help. I don't have a good solution other than "do not use MS Excel".

There probably is a way to suppress the behaviour, but you need to ask a question tagged and rather than and and .

Incidentally, a Google search on 'excel csv import force text' turns up Stop Excel from automatically converting certain text values to dates? Some of the techniques outlined there (notably the "rename the file from .csv to .txt" technique) work.

Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278