1

I want to transpose and group my data: The data shape is:

APOC2   GO:0006629
APOC2   GO:0006869
APOC2   GO:0008047
APOC2   GO:0042627
APOC2   GO:0043085
CRYAB   GO:0005212
SERPINA1    GO:0005615
DMD GO:0001954
DMD GO:0002162
DMD GO:0003779
DMD GO:0005200
DMD GO:0005886

But I require data in this simple tab delimited format: (i.e the records in $1 are grouped such that it appear once, and all its GO values (which are present in $2 of input file) should come in front of it in the same row). Like the output for above records is:

APOC2   GO:0006629  GO:0006869  GO:0008047  GO:0042627  GO:0043085
CRYAB   GO:0005212
SERPINA1    GO:0005615
DMD GO:0001954  GO:0002162  GO:0003779  GO:0005200  GO:0005886

The solution is given in questions/17853218 at this forum, but my data file is large such that MS Excel cannot handle it. How can I do same task in Linux or R program. Thanks.

M.sh
  • 157
  • 1
  • 9
  • 1
    Possible duplicate of [Collapse text by group in data frame](http://stackoverflow.com/questions/22756372/collapse-text-by-group-in-data-frame) – jogo May 23 '16 at 10:44

1 Answers1

2
awk '$1 == key { data = data "\t" $2; next; } { print key "\t" data; key = $1; data = $2; } END { print key "\t" data }' awkdata.txt
Michael Vehrs
  • 3,293
  • 11
  • 10
  • Thanks `Michael Vehrs` the script works, and produces the desired output, but here is a small issue: $1 of my data file contains `1352` unique values, which i found by `sort | uniq | wc -l`, but the script you proposed is creating `3679` lines, meaning that some values in $1 are being grouped more than once. – M.sh May 25 '16 at 19:45
  • Well, your question does not include the requirement that values need to be unique. – Michael Vehrs May 26 '16 at 05:08