1

I receive a csv file which always contains the same number of rows but never the same number of columns (Sometimes I have 3 columns, and sometimes it can go up to 12 or more!).

The file looks like this:

cat file.csv
John-;Paul-;Lisa-;Tim-
21-;44-;25-;33-
London-;Paris-;Chicago-;Roma-
Student;Teacher;Engineer;Cook
Funny-;Clever-;Sincere-;Passionate-

I wish to write in a text file the content of each column in a precise order while respecting one column per line, for example:

John-London-21-Funny-Student
Paul-Paris-44-Clever-Teacher
Lisa-Chicago-25-Sincer-Engineer
Tim-Roma-33-Passionate-Cook

I wrote this bash script:

cat file.csv | awk -F";" '{ print $1 }' > temp1
declare -a lines
readarray -t lines <temp1

echo -n "${lines[0]}" > result.txt
echo -n "${lines[2]}" >> result.txt
echo -n "${lines[1]}" >> result.txt
echo -n "${lines[4]}" >> result.txt
echo -n "${lines[3]}" >> result.txt

The result is correct because I get this:

cat result.txt
John-London-21-Student

...but I only get the first occurrence, I don't know how to loop the awk command and increment it to read all the columns of the file.

Do you have any ideas?

Alexo Dpt
  • 11
  • 2
  • 1
    The order of tokens in your output is not the same as the order of rows in the input. Look at John-London-... you show Student as the last token, but in your inputs, the first column of the last line has Funny, not Student. So - what is the "precise order" in which you need the output, since it is **not** the row order in the input, and you didn't give any explanation? –  May 15 '20 at 01:57
  • I'm sorry, I didn't specify the desired order and I didn't correctly writes the contents of the result.txt file, indeed it contains this: John-London-21-Funny-Student So I'd like to retrieve lines 1-3-2-5-4. Therefore for the first column of the file, my result file will have this line: John-London-21-Funny-Student... And then..: Paul-Paris-44-Clever-Teacher – Alexo Dpt May 15 '20 at 02:10

2 Answers2

1

Your data is easier to process if you transpose it first. I used GNU datamash, but you can do this with awk if you want (see An efficient way to transpose a file in Bash for example):

$ datamash -t';' transpose < file.csv | awk -F';' '{ print $1 $3 $2 $5 $4 }'
John-London-21-Funny-Student
Paul-Paris-44-Clever-Teacher
Lisa-Chicago-25-Sincere-Engineer
Tim-Roma-33-Passionate-Cook
Freddy
  • 4,548
  • 1
  • 7
  • 17
  • If you don't have `datamash` installed, see [here](https://www.gnu.org/software/datamash/download/#packages) – Freddy May 15 '20 at 02:19
1
awk -F ';' '{ for (i = 1; i <= NF; i++) token[NR, i] = $i }
END { for (i = 1; i <= NF; i++) 
print token[1, i] token[3, i] token[2, i] token[5, i] token[4, i] }' file.csv
  • Thank you, it also works but as I don't have any particular restrictions, I prefer to use datamash. – Alexo Dpt May 15 '20 at 03:13
  • @AlexoDpt - That is fine. I assume you don't need to run this often, and you don't have a lot of data. (If you did, you might care about speed of execution; I expect my solution to be faster, as it does less work.) –  May 15 '20 at 03:16
  • After several tests, I realize that your awk-based solution is more portable, so I'm going to use it. Thank you all for helping me so quickly! – Alexo Dpt May 16 '20 at 20:03