2

I have a CSV-file with 177 columns and 54248 rows as below:

SNP_Name,Chr,Coordinate,R921B12,R921C12,R921D12,...

CL635944_160.1,0,0,CC,CC,CC,...
CR_594.1,0,0,TT,TT,TT,...
CR_816.1,0,0,CC,TT,TT,...

I need to have a tab-delimited file with 54284 columns and 177 rows like:

R921B12 C C T T C C ...
R921C12 C C T T T T ...
R921D12 C C T T T T ...

The following command allows me to transpose a single column (number 3)

awk  '{ printf( "%s ", $3); } END { printf( "\n" ); }' a.csv

but how can I do this for all of them?

kvantour
  • 25,269
  • 4
  • 47
  • 72
mary
  • 59
  • 6
  • Would an array of arrays work for you? – Edward Minnix Jul 17 '18 at 13:47
  • 1
    How are the two formats related? It doesn't seem like one is the transpose of the other. – karakfa Jul 17 '18 at 18:48
  • I need to make a ped file format to run in plink. this make a file which is after 6th column in ped file – mary Jul 17 '18 at 19:18
  • I use "python -c "import sys; print('\n'.join(' '.join(c) for c in zip(*(l.split() for l in sys.stdin.readlines() if l.strip()))))" < input > output" its work but I need to seperat column too as " C C T T G G" instad of " CC TT GG". then I use " sed 's/ \+//g' test.txt | awk '{for (i=1; i<=NF; i+=1) {printf$(i)" "; if (i==NF) printf "\n"}}' FS='' " but get this error " awk: program limit exceeded: maximum number of fields size=32767 FILENAME="-" FNR=1 NR=1 I looking for python script to seperate cloumn. any suggestion be appriciable – mary Jul 18 '18 at 14:11

2 Answers2

1

Scan the input file one row at a time, writing to a temporary file one line for each row and field that contains the row number, the field number, and the data contents of the field. As written, the temporary file will be ordered by field within row. Sort the temporary file so it is ordered by row within field, then scan the temporary file and rebuilt the output in the desired order.

user448810
  • 17,381
  • 4
  • 34
  • 59
1

GNU datamash has a transpose operation that will do this. It can also change the field delimiter from comma to TAB. Here's an example (a.csv is as shown in the question.)

$ cat a.csv
SNP_Name,Chr,Coordinate,R921B12,R921C12,R921D12
CL635944_160.1,0,0,CC,CC,CC
CR_594.1,0,0,TT,TT,TT
CR_816.1,0,0,CC,TT,TT

$ # datamash transpose (cut removes first three fields)
$ cut -d , -f 4- a.csv | datamash --field-separator=, --output- 
delimiter=$'\t' transpose
R921B12 CC  TT  CC
R921C12 CC  TT  TT
R921D12 CC  TT  TT
JonDeg
  • 386
  • 3
  • 8