1

I have a file of the form:

#some header text
a    1       1234
b    2       3333
c    2       1357

#some header text 
a    4       8765
b    1       1212
c    7       9999
...

with repeated data in n-row chunks separated by a blank line (with possibly some other header text). I'm only interested in the third column, and would like to do some grep, cut, awk, sed, paste magic to turn it in to this:

a   1234    8765   ...
b   3333    1212
c   1357    9999

where the third column of each subsequent n-row chunk is tacked on as a new column. I guess you could call it a transpose, just n-lines at a time, and only a specific column. The leading (a b c) column label isn't essential... I'd be happy if I could just grab the data in the third column

Is this even possible? It must be. I can get things chopped down to only the interesting columns with grep and cut:

cat myfile | grep -A2 ^a\  | cut -c13-15

but I can't figure out how to take these n-row chunks and sed/paste/whatever them into repeated n-row columns.

Any ideas?

kris
  • 23
  • 2
  • possible duplicate of [How to merge two files using AWK?](http://stackoverflow.com/questions/5467690/how-to-merge-two-files-using-awk) – tripleee Jun 11 '14 at 16:29
  • 3
    But this is not merging 2 files, it is just processing single file or did I read it wrong? – anubhava Jun 11 '14 at 16:31

4 Answers4

1

This awk does the job:

awk 'NF<3 || /^(#|[[:blank:]]*$)/{next} !a[$1]{b[++k]=$1; a[$1]=$3; next} 
        {a[$1] = a[$1] OFS $3} END{for(i=1; i<=k; i++) print b[i], a[b[i]]}' file
a 1234 8765
b 3333 1212
c 1357 9999
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • And if has more than a b c ? – rpax Jun 11 '14 at 16:29
  • 1
    OP wrote: `The leading (a b c) column label isn't essential... I'd be happy if I could just grab the data in the third column` – anubhava Jun 11 '14 at 16:30
  • 1
    You're an awk wizard! This works perfectly, and generally (my actual test file was much more of a mess than the (a b c) example. Much appreciated. – kris Jun 11 '14 at 16:45
1
awk '/#/{next}{a[$1] = a[$1] $3 "\t"}END{for(i in a){print i, a[i]}}' file

Would produce

a 1234  8765
b 3333  1212
c 1357  9999

You can change "\t" to a different output separator like " " if you like.

sub(/\t$/, "", a[i]); may be inserted before printif uf you don't like having trailing spaces. Another solution is to check if a[$1] already has a value where you decide if you have append to a previous value or not. It complicates the code a bit though.

konsolebox
  • 72,135
  • 12
  • 99
  • 105
0

Using bash > 4.0:

declare -A array
while read line
do
   if [[ $line && $line != \#* ]];then
       c=$( echo $line | cut -f 1 -d ' ')
       value=$( echo $line | cut -f 3 -d ' ')
       array[$c]="${array[$c]} $value"
   fi
done < myFile.txt

for k in "${!array[@]}"
do
    echo "$k ${array[$k]}"
done

Will produce:

a  1234 8765
b  3333 1212
c  1357 9999

It stores the letter as the key of the associative array, and in each iteration, appends the correspondig value to it.

rpax
  • 4,468
  • 7
  • 33
  • 57
0
$ awk -v RS= -F'\n' '{ for (i=2;i<=NF;i++) {split($i,f,/[[:space:]]+/); map[f[1]] = map[f[1]] " " f[3]} } END{ for (key in map) print key map[key]}' file
a 1234 8765
b 3333 1212
c 1357 9999
Ed Morton
  • 188,023
  • 17
  • 78
  • 185