0

I've got a CSV file that I need to split, to extract the ith column value of each record.

Here is a sample file.csv.

Column1,Column2,Column3
"value1_1_a, value1_1_b",value1_2,value1_3
"value2_1_a, value2_1_b, value2_1_c",value2_2,value2_3

In general, if I'd like to extract (for instance) Column2 values, I'd opt for a command like:

cat file.csv | awk -F, '{print $2}'

to get a result like:

Column2
value1_2
value2_2

Nevertheless, since Column1 values include strings with a variable number of , separator, I get wrong data. Hence, how can I use awk to accomplish my purpose?

auino
  • 1,644
  • 5
  • 23
  • 43
  • 1
    If only column one ever has the problem of embedded delimiters, and all records have the same number of fields, it may be easier to work backwards from `NF` than to deal with `FPAT`. E.g. `{ print $(NF - 1) }` – jas Jul 23 '19 at 13:24
  • Thanks @jas, it theoretically work, but the column containing the string may also be in the middle of the columns list. – auino Jul 23 '19 at 14:12

1 Answers1

1

Unless you have to use awk, you could use csvtool to do the job for you, which allows referring to the name or the number of the column - http://colin.maudry.com/csvtool-manual-page/

csvtool namedcol Column2 csvfile

Column2
value1_2
value2_2

csvtool col 2 csvfile

Column2
value1_2
value2_2

Or you can use an awk field pattern instead of a field-separator pattern:

cat csvfile | awk -vFPAT='[^,]*|"[^"]*"' '{print $2}

Column2
value1_2
value2_2
Paul Dawson
  • 1,332
  • 14
  • 27
  • Thanks. Any way to make it through `awk` only, or other standard tools like `sed`? – auino Jul 23 '19 at 14:10
  • Check update for GNU Awk 4 extension example. – Paul Dawson Jul 23 '19 at 14:18
  • Where is it possible to download `csvtool` outside of `apt`? Useful for instance for non-Linux systems. – auino Jul 23 '19 at 18:18
  • For which system? You could also use csvkit using pip which can be installed on windows with python - https://csvkit.readthedocs.io/en/latest/index.html – Paul Dawson Jul 23 '19 at 22:02
  • For macOS. For now, I've solved by removing the string column content, with the following command: `cat file.csv | sed 's/\".*\"/removed\_text/'`. Although this works in my specific case, it is not a good and general solution. – auino Jul 24 '19 at 07:36
  • `csvkit` is available on homebrew - https://formulae.brew.sh/formula/csvkit – Paul Dawson Jul 24 '19 at 08:10