grep, cut, sed, awk a file for 3rd column, n lines at a time, then paste into repeated columns of n rows?

Question

I have a file of the form:

#some header text
a    1       1234
b    2       3333
c    2       1357

#some header text 
a    4       8765
b    1       1212
c    7       9999
...

with repeated data in n-row chunks separated by a blank line (with possibly some other header text). I'm only interested in the third column, and would like to do some grep, cut, awk, sed, paste magic to turn it in to this:

a   1234    8765   ...
b   3333    1212
c   1357    9999

where the third column of each subsequent n-row chunk is tacked on as a new column. I guess you could call it a transpose, just n-lines at a time, and only a specific column. The leading (a b c) column label isn't essential... I'd be happy if I could just grab the data in the third column

Is this even possible? It must be. I can get things chopped down to only the interesting columns with grep and cut:

cat myfile | grep -A2 ^a\  | cut -c13-15

but I can't figure out how to take these n-row chunks and sed/paste/whatever them into repeated n-row columns.

Any ideas?

possible duplicate of [How to merge two files using AWK?](http://stackoverflow.com/questions/5467690/how-to-merge-two-files-using-awk) — tripleee, Jun 11 '14 at 16:29
But this is not merging 2 files, it is just processing single file or did I read it wrong? — anubhava, Jun 11 '14 at 16:31

score 1 · Accepted Answer · answered Jun 11 '14 at 16:27

1

This awk does the job:

awk 'NF<3 || /^(#|[[:blank:]]*$)/{next} !a[$1]{b[++k]=$1; a[$1]=$3; next} 
        {a[$1] = a[$1] OFS $3} END{for(i=1; i<=k; i++) print b[i], a[b[i]]}' file
a 1234 8765
b 3333 1212
c 1357 9999

answered Jun 11 '14 at 16:27

anubhava

761,203
64
569
643

And if has more than a b c ? – rpax Jun 11 '14 at 16:29
1

OP wrote: `The leading (a b c) column label isn't essential... I'd be happy if I could just grab the data in the third column` – anubhava Jun 11 '14 at 16:30
1

You're an awk wizard! This works perfectly, and generally (my actual test file was much more of a mess than the (a b c) example. Much appreciated. – kris Jun 11 '14 at 16:45

konsolebox · Answer 2 · 2014-06-11T16:44:20.163

awk '/#/{next}{a[$1] = a[$1] $3 "\t"}END{for(i in a){print i, a[i]}}' file

Would produce

a 1234  8765
b 3333  1212
c 1357  9999

You can change "\t" to a different output separator like " " if you like.

sub(/\t$/, "", a[i]); may be inserted before printif uf you don't like having trailing spaces. Another solution is to check if a[$1] already has a value where you decide if you have append to a previous value or not. It complicates the code a bit though.

rpax · Answer 3 · 2014-06-11T17:06:27.887

Using bash > 4.0:

declare -A array
while read line
do
   if [[ $line && $line != \#* ]];then
       c=$( echo $line | cut -f 1 -d ' ')
       value=$( echo $line | cut -f 3 -d ' ')
       array[$c]="${array[$c]} $value"
   fi
done < myFile.txt

for k in "${!array[@]}"
do
    echo "$k ${array[$k]}"
done

Will produce:

a  1234 8765
b  3333 1212
c  1357 9999

It stores the letter as the key of the associative array, and in each iteration, appends the correspondig value to it.

score 0 · Answer 4 · answered Jun 11 '14 at 17:20

0

$ awk -v RS= -F'\n' '{ for (i=2;i<=NF;i++) {split($i,f,/[[:space:]]+/); map[f[1]] = map[f[1]] " " f[3]} } END{ for (key in map) print key map[key]}' file
a 1234 8765
b 3333 1212
c 1357 9999

answered Jun 11 '14 at 17:20

Ed Morton

188,023
17
78
185

grep, cut, sed, awk a file for 3rd column, n lines at a time, then paste into repeated columns of n rows?

4 Answers4