-1

In csv files on Linux server, I have thousands of rows in below csv format

0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|

I need to get output from all the files on below format (2nd field ie 20221208195546466 and 5th field but value after Above as: and before first | ie 2 in above example )

output :

20221208195546466 , 2

Can anyone help me with linux command ?

Edit :

my attempts

I tried but it give field 5th value. How to add field 2 as well ?

cat *.csv | cut -d, -f5|cut -d'|' -f1|cut -d':' -f2|

EDIT : sorted result

Now I am using this command (based on Dave Pritlove answer ) awk -F'[,|:]' '{print $2", "$6}' file.csv. However, I have one more query, If I have to sort the output based on $6 ( value 2 in your example ) then how can i do it ? I want result should be displayed in sorted order based on 2nd output field. for ex :

20221208195546366, 20

20221208195546436, 16

20221208195546466, 5

2022120819536466, 2

VJS
  • 2,891
  • 7
  • 38
  • 70
  • Can anyone please help ? Why close votes ? i tried with cat and cut but couldnt do it – VJS Dec 08 '22 at 16:24
  • *"Why close votes?"*: please see [How much research efffort is expected of SO users?](https://meta.stackoverflow.com/questions/261592/how-much-research-effort-is-expected-of-stack-overflow-users). This has been asked tons of times, you must be able to find some questions about this, if you cant get it to work, [edit] your question to show your attempt(s) – 0stone0 Dec 08 '22 at 16:42
  • Please see my edit to understand my attempt – VJS Dec 08 '22 at 16:51
  • I tried cat *.csv | awk -F',' '{gsub(/Total ts:/,""); gsub(/\|.*/, ""); print($2, $5)}' – VJS Dec 08 '22 at 16:54
  • please update the question with your `awk` attempt(s); also, your `awk` code references a string `Total ts:` but no such string exists in your sample input; if the string in question can vary across rows (eg, `Total ts:` vs `Above as:`) then please update the question to show a few rows of input that display this difference in strings; also, your expected output shows a space on each side of the `,` ... is this what you really want or should those spaces not exist (as is the case in the input)? – markp-fuso Dec 08 '22 at 17:00
  • See [whats-the-most-robust-way-to-efficiently-parse-csv-using-awk](https://stackoverflow.com/questions/45420535/whats-the-most-robust-way-to-efficiently-parse-csv-using-awk) – Ed Morton Dec 08 '22 at 17:16

4 Answers4

1

Gnu awk allows multiple field separators to be set, allowing you to delimit each record at ,, |, and : at the same time. Thus, the following will fish out the required fields from file.csv:

awk -F'[,|:]' '{print $2", "$6}' file.csv

Tested on the single record example:

echo "0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|" | awk -F'[,|:]' '{print $2", "$6}'

output:

20221208195546466, 2
Dave Pritlove
  • 2,601
  • 3
  • 15
  • 14
  • 2
    My thoughts exactly. Use `OFS=", "`, and then `print $2, $6` – glenn jackman Dec 08 '22 at 18:48
  • Thanks. I have one more query, If I have to sort the output based on $6 ( value 2 in your example ) then how can i do it ? I want result should be displayed in sorted order based on 2nd output field. for ex : 20221208195546366, 20 20221208195546436, 16 20221208195546466, 5 2022120819536466, 2 – VJS Dec 09 '22 at 07:22
  • apply `sort` to the output file. Lots of examples available by searching, I'd probably try `sort -t, -nk2 outfile.csv` (-t sets the delimeter to comma, n forces numerical interpretation, k2 specifies 2nd field). – Dave Pritlove Dec 09 '22 at 13:35
1

Assumptions:

  • starting string of the 5th comma-delimited field can vary from line to line (ie, not known before hand)
  • the item of interest in the 5th comma-delimited field occurs between the first : and the first |

Sample data:

$ cat test.csv
0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|
1,20230124123456789,10,1730,Total ts:7|stuff:HAER:0|morestuff:FON:0|yetmorestuff:ION:0|

One awk approach:

awk '
BEGIN { FS=OFS="," }                    # define input/output field delimiter as ","
      { split($5,a,"[:|]")              # split 5th field on dual delimiters ":" and "|", store results in array a[]
        print $2,a[2]                   # print desired items to stdout
      }
' test.csv

This generates:

20221208195546466,2
20230124123456789,7
markp-fuso
  • 28,790
  • 4
  • 16
  • 36
0

You can use awk for this:

awk -F',' '{gsub(/Above as:/,""); gsub(/\|.*/, ""); print($2, $5)}'

Probably need to adopt regexp a bit.

0

You might change : to , and | to , then extract 2nd and 6th field using cut following way, let file.txt content be

0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|

then

tr ':|' ',,' < file.txt | cut --delimiter=',' --output-delimiter=' , ' --fields=2,6

gives output

20221208195546466 , 2

Explanation: tr translates i.e. replace : using , and replace | using , then I inform cut that delimiter in input is , output delimiter is , encased in spaces (as stipulated by your desired output) and want 2th and 6th column (not 5th, as it is now Above as)

(tested using GNU coreutils 8.30)

Daweo
  • 31,313
  • 3
  • 12
  • 25