1

Input where identifier specified by two rows 1-2

L1_I                L1_I                C-14               <---|  unique idenfier 
WWPTH               WWPT                WWPTH              <---|  on two rows
1                   2                   3

Goal: how to concatenate the rows?

L1_IWWPTH           L1_IWWPT            C-14WWPTH          <--- unique identifier
1                   2                   3

P.s. I will accept the simplest and most elegant solution.

hhh
  • 50,788
  • 62
  • 179
  • 282
  • Basically: read rows 1-2, transpose, concatenate and transpose back to original position. It is super easy task, now just trying to find some succinct tool to do this. [Transpose](http://stackoverflow.com/questions/1729824/transpose-a-file-in-bash), hmmm...there must be some ready easy solution to this without reinventing the wheel...taking some time to come up with it. – hhh Nov 11 '14 at 04:30
  • whats the structure of your input ? – Mazdak Nov 11 '14 at 04:53
  • @Kasra You are free to play with the large ragged case [here](https://dl.dropboxusercontent.com/u/96742826/Mathematica/henris_data_s3.csv). Q here simplified. You can see there that sometimes the col entry has no value. – hhh Nov 11 '14 at 04:57
  • I am starting to think this is a super easy task in [Emacs](https://www.gnu.org/software/emacs/manual/html_node/emacs/Transpose.html): C-x C-t for transpose and then concatenate and then C-x C-t. Perhaps that is the easiest solution here -- I wish some Emacs/Vim guru would see this: somehow nowrap, tabstop=20, transpose, concatenate, transpose, done :D – hhh Nov 11 '14 at 05:22

3 Answers3

2

Assuming that the input is in a file called file:

$ awk 'NR==1{for (i=1;i<=NF;i++) a[i]=$i;next} NR==2{for (i=1;i<=NF;i++) printf "%-20s",a[i] $i;print"";next} 1' file
L1_IWWPTH           L1_IWWPT            C-14WWPTH           
1                   2                   3

How it works

  • NR==1{for (i=1;i<=NF;i++) a[i]=$i;next}

    For the first line, save all the column headings in the array a. Then, skip over the rest of the commands and jump to the next line.

  • NR==2{for (i=1;i<=NF;i++) printf "%-20s",a[i] $i;print"";next}

    For the second line, print all the column headings, merging together the ones from the first and second rows. Then, skip over the rest of the commands and jump to the next line.

  • 1

    1 is awk's cryptic shorthand for print the line as is. This is done for all lines after the seconds.

Tab-separated columns with possible missing columns

If columns are tab-separated:

awk -F'\t' 'NR==1{for (i=1;i<=NF;i++) a[i]=$i;next} NR==2{for (i=1;i<=NF;i++) printf "%s\t",a[i] $i;print"";next} 1' file
John1024
  • 109,961
  • 14
  • 137
  • 171
  • This has a mistake with input where the second row has an empty value. – hhh Nov 11 '14 at 04:52
  • If the second row has an empty value, how do we assign items to columns? Is it a fixed format of 20 characters per column? – John1024 Nov 11 '14 at 05:03
  • Separator is TAB and empty value specified by TAB. 20 is due to my tabstop=20 settings in Vim :) – hhh Nov 11 '14 at 05:06
  • 1
    @hhh I added a version for tab-separated input. It can handle missing columns. – John1024 Nov 11 '14 at 05:21
0

If you plan to use python, you can use zip in the following way:

input = [['L1_I', 'L1_I', 'C-14'], ['WWPTH','WWPT','WWPTH'],[1,2,3]]
output = [[i+j for i,j in  zip(input[0],input[1])]] + input[2:]
print output

output:

[['L1_IWWPTH', 'L1_IWWPT', 'C-14WWPTH'], [1, 2, 3]]
venpa
  • 4,268
  • 21
  • 23
  • I love list-comprehensions +1, Python's zip is transpose. I am still expecting that there must be some simpler *ix tool for this so waiting other answers :D – hhh Nov 11 '14 at 04:48
  • 1
    Are you looking for something like [`itertools.izip_longest`](https://docs.python.org/2/library/itertools.html#itertools.izip_longest)? – venpa Nov 11 '14 at 05:07
  • I don't know yet. I provided the large CSV datadump above if you want to play with this tool and real ragged data case. I cannot yet fully understand the command, will look for it :) – hhh Nov 11 '14 at 05:10
0
#!/usr/bin/awk -f
NR == 1 {
  split($0, a)
  next
}
NR == 2 {
  for (b in a)
    printf "%-20s", a[b] $b
  print ""
  next
}
1
Zombo
  • 1
  • 62
  • 391
  • 407
  • hmmm...awk is not run automatically by the first line after execution? I made it executable, chmod +x yourCode.sh. My bad -- some time since used awk last time, taking longer time to test this. Perhaps you can add short instructions for as dumb people as me :/ – hhh Nov 11 '14 at 05:00
  • I wish I remembered where awk located in OSX, anyway the same err with your newest line. Sure it works? – hhh Nov 11 '14 at 05:04
  • Sure, that is the location! Something still not working in this: `$ ./.ttt.sh .test /usr/bin/awk: syntax error at source line 5 source file ./.ttt.sh context is >>> baz[NR][ <<< /usr/bin/awk: illegal statement at source line 5 source file ./.ttt.sh /usr/bin/awk: illegal statement at source line 5 source file ./.ttt.sh` – hhh Nov 11 '14 at 05:32
  • This messes up the order of the columns, bad. Output with the large datadump `C-14WWITH K1_PRODWOPT L1_PRODWOPT L1_INJWOPT` when it should start with `TIME...`. And I feel this has the same mistake as John1024 had at the start, you use the 20char as a specification, errorsome. I recommend to sneakpeak John's solution, it is very nice. – hhh Nov 11 '14 at 05:45
  • Still order messed up :/ – hhh Nov 11 '14 at 05:53