Select content of each column in a precise order while respecting one column per line (CSV / AWK / BASH)

Question

I receive a csv file which always contains the same number of rows but never the same number of columns (Sometimes I have 3 columns, and sometimes it can go up to 12 or more!).

The file looks like this:

cat file.csv
John-;Paul-;Lisa-;Tim-
21-;44-;25-;33-
London-;Paris-;Chicago-;Roma-
Student;Teacher;Engineer;Cook
Funny-;Clever-;Sincere-;Passionate-

I wish to write in a text file the content of each column in a precise order while respecting one column per line, for example:

John-London-21-Funny-Student
Paul-Paris-44-Clever-Teacher
Lisa-Chicago-25-Sincer-Engineer
Tim-Roma-33-Passionate-Cook

I wrote this bash script:

cat file.csv | awk -F";" '{ print $1 }' > temp1
declare -a lines
readarray -t lines <temp1

echo -n "${lines[0]}" > result.txt
echo -n "${lines[2]}" >> result.txt
echo -n "${lines[1]}" >> result.txt
echo -n "${lines[4]}" >> result.txt
echo -n "${lines[3]}" >> result.txt

The result is correct because I get this:

cat result.txt
John-London-21-Student

...but I only get the first occurrence, I don't know how to loop the awk command and increment it to read all the columns of the file.

Do you have any ideas?

The order of tokens in your output is not the same as the order of rows in the input. Look at John-London-... you show Student as the last token, but in your inputs, the first column of the last line has Funny, not Student. So - what is the "precise order" in which you need the output, since it is **not** the row order in the input, and you didn't give any explanation? — , May 15 '20 at 01:57
I'm sorry, I didn't specify the desired order and I didn't correctly writes the contents of the result.txt file, indeed it contains this: John-London-21-Funny-Student So I'd like to retrieve lines 1-3-2-5-4. Therefore for the first column of the file, my result file will have this line: John-London-21-Funny-Student... And then..: Paul-Paris-44-Clever-Teacher — Alexo Dpt, May 15 '20 at 02:10

score 1 · Answer 1 · answered May 15 '20 at 02:10

1

Your data is easier to process if you transpose it first. I used GNU datamash, but you can do this with awk if you want (see An efficient way to transpose a file in Bash for example):

$ datamash -t';' transpose < file.csv | awk -F';' '{ print $1 $3 $2 $5 $4 }'
John-London-21-Funny-Student
Paul-Paris-44-Clever-Teacher
Lisa-Chicago-25-Sincere-Engineer
Tim-Roma-33-Passionate-Cook

answered May 15 '20 at 02:10

Freddy

4,548
1
7
17

If you don't have `datamash` installed, see [here](https://www.gnu.org/software/datamash/download/#packages) – Freddy May 15 '20 at 02:19

score 1 · Answer 2 · 2020-05-15T02:48:25.593

1

awk -F ';' '{ for (i = 1; i <= NF; i++) token[NR, i] = $i }
END { for (i = 1; i <= NF; i++) 
print token[1, i] token[3, i] token[2, i] token[5, i] token[4, i] }' file.csv

edited May 15 '20 at 02:48

answered May 15 '20 at 02:41

Thank you, it also works but as I don't have any particular restrictions, I prefer to use datamash. – Alexo Dpt May 15 '20 at 03:13
@AlexoDpt - That is fine. I assume you don't need to run this often, and you don't have a lot of data. (If you did, you might care about speed of execution; I expect my solution to be faster, as it does less work.) – May 15 '20 at 03:16
After several tests, I realize that your awk-based solution is more portable, so I'm going to use it. Thank you all for helping me so quickly! – Alexo Dpt May 16 '20 at 20:03

Select content of each column in a precise order while respecting one column per line (CSV / AWK / BASH)

2 Answers2