2

I have a CSV file like this:

Name,Age,Pos,Country
John,23,GK,Spain
Jack,30,"LM, MC, ST",Brazil
Luke,21,"CMD, CD",England

And I need to get this:

Name,Age,Pos,Country
John,23,GK,Spain
Jack,30,LM,Brazil
Luke,21,CMD,England

With this expression I can extract the field but I don't know how to update it in the dataset

grep -o '\(".*"\)' file.csv | cut -d "," -f | sed 's/"//'
M--
  • 25,431
  • 8
  • 61
  • 93
David Molina
  • 141
  • 1
  • 7
  • On SO we do encourage users to add their efforts which they put in order to solve their own problems, so please do add the same and let us know(not my down-vote BTW). – RavinderSingh13 May 26 '20 at 09:52
  • 1
    You are right, I forgot to show it, sorry. – David Molina May 26 '20 at 10:00
  • Does this answer your question? [What's the most robust way to efficiently parse CSV using awk?](https://stackoverflow.com/questions/45420535/whats-the-most-robust-way-to-efficiently-parse-csv-using-awk) – kvantour May 26 '20 at 12:06

2 Answers2

3
$ sed -E 's/"([^,]+)[^"]*"/\1/' ip.txt
John,23,GK,Spain
Jack,30,LM,Brazil
Luke,21,CMD,England
  • -E to enable ERE
  • " match double quote
  • ([^,]+) match non-comma characters and capture it for reuse in replacement section
  • [^"]*" any other remaining characters
  • \1 will refer to the text that was captured with ([^,]+)

Note that this will work only one double quoted field and won't work if there are other valid csv formats like escaped double quotes, newline character in field, etc

Sundeep
  • 23,246
  • 2
  • 28
  • 103
1

Could you please try following, this should cover case when you have more than 1 occurrence of "....." in your Input_file, written and tested with GNU awk.

awk -v FPAT='[^"]*|"[^"]+"' '
BEGIN{
  OFS=""
}
{
  for(i=1;i<=NF;i++){
    if($i~/^".*"$/){
      gsub(/^"|"$|[, ].*/,"",$i)
    }
  }
}
1
'  Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93