Linux Command to get fields from CSV files

Question

In csv files on Linux server, I have thousands of rows in below csv format

0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|

I need to get output from all the files on below format (2nd field ie 20221208195546466 and 5th field but value after Above as: and before first | ie 2 in above example )

output :

20221208195546466 , 2

Can anyone help me with linux command ?

Edit :

my attempts

I tried but it give field 5th value. How to add field 2 as well ?

cat *.csv | cut -d, -f5|cut -d'|' -f1|cut -d':' -f2|

EDIT : sorted result

Now I am using this command (based on Dave Pritlove answer ) awk -F'[,|:]' '{print $2", "$6}' file.csv. However, I have one more query, If I have to sort the output based on $6 ( value 2 in your example ) then how can i do it ? I want result should be displayed in sorted order based on 2nd output field. for ex :

20221208195546366, 20

20221208195546436, 16

20221208195546466, 5

2022120819536466, 2

Can anyone please help ? Why close votes ? i tried with cat and cut but couldnt do it — VJS, Dec 08 '22 at 16:24
*"Why close votes?"*: please see [How much research efffort is expected of SO users?](https://meta.stackoverflow.com/questions/261592/how-much-research-effort-is-expected-of-stack-overflow-users). This has been asked tons of times, you must be able to find some questions about this, if you cant get it to work, [edit] your question to show your attempt(s) — 0stone0, Dec 08 '22 at 16:42
I tried cat *.csv | awk -F',' '{gsub(/Total ts:/,""); gsub(/\|.*/, ""); print($2, $5)}' — VJS, Dec 08 '22 at 16:54
please update the question with your `awk` attempt(s); also, your `awk` code references a string `Total ts:` but no such string exists in your sample input; if the string in question can vary across rows (eg, `Total ts:` vs `Above as:`) then please update the question to show a few rows of input that display this difference in strings; also, your expected output shows a space on each side of the `,` ... is this what you really want or should those spaces not exist (as is the case in the input)? — markp-fuso, Dec 08 '22 at 17:00
See [whats-the-most-robust-way-to-efficiently-parse-csv-using-awk](https://stackoverflow.com/questions/45420535/whats-the-most-robust-way-to-efficiently-parse-csv-using-awk) — Ed Morton, Dec 08 '22 at 17:16

score 1 · Accepted Answer · answered Dec 08 '22 at 17:08

1

Gnu awk allows multiple field separators to be set, allowing you to delimit each record at ,, |, and : at the same time. Thus, the following will fish out the required fields from file.csv:

awk -F'[,|:]' '{print $2", "$6}' file.csv

Tested on the single record example:

echo "0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|" | awk -F'[,|:]' '{print $2", "$6}'

output:

20221208195546466, 2

answered Dec 08 '22 at 17:08

Dave Pritlove

2,601
3
15
14

2

My thoughts exactly. Use `OFS=", "`, and then `print $2, $6` – glenn jackman Dec 08 '22 at 18:48
Thanks. I have one more query, If I have to sort the output based on $6 ( value 2 in your example ) then how can i do it ? I want result should be displayed in sorted order based on 2nd output field. for ex : 20221208195546366, 20 20221208195546436, 16 20221208195546466, 5 2022120819536466, 2 – VJS Dec 09 '22 at 07:22
apply `sort` to the output file. Lots of examples available by searching, I'd probably try `sort -t, -nk2 outfile.csv` (-t sets the delimeter to comma, n forces numerical interpretation, k2 specifies 2nd field). – Dave Pritlove Dec 09 '22 at 13:35

score 1 · Answer 2 · answered Dec 08 '22 at 17:09

Assumptions:

starting string of the 5th comma-delimited field can vary from line to line (ie, not known before hand)
the item of interest in the 5th comma-delimited field occurs between the first : and the first |

Sample data:

$ cat test.csv
0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|
1,20230124123456789,10,1730,Total ts:7|stuff:HAER:0|morestuff:FON:0|yetmorestuff:ION:0|

One awk approach:

awk '
BEGIN { FS=OFS="," }                    # define input/output field delimiter as ","
      { split($5,a,"[:|]")              # split 5th field on dual delimiters ":" and "|", store results in array a[]
        print $2,a[2]                   # print desired items to stdout
      }
' test.csv

This generates:

20221208195546466,2
20230124123456789,7

score 0 · Answer 3 · answered Dec 08 '22 at 16:38

0

You can use awk for this:

awk -F',' '{gsub(/Above as:/,""); gsub(/\|.*/, ""); print($2, $5)}'

Probably need to adopt regexp a bit.

answered Dec 08 '22 at 16:38

rutaka nashimo

84
3

score 0 · Answer 4 · answered Dec 09 '22 at 09:43

You might change : to , and | to , then extract 2nd and 6th field using cut following way, let file.txt content be

0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|

then

tr ':|' ',,' < file.txt | cut --delimiter=',' --output-delimiter=' , ' --fields=2,6

gives output

20221208195546466 , 2

Explanation: tr translates i.e. replace : using , and replace | using , then I inform cut that delimiter in input is , output delimiter is , encased in spaces (as stipulated by your desired output) and want 2th and 6th column (not 5th, as it is now Above as)

(tested using GNU coreutils 8.30)

Linux Command to get fields from CSV files

4 Answers4