2

I have a text file with comma separated values. A sample line can be something like

 "Joga","Bonito",7,"Machine1","Admin"  

The " seen are part of the text and are needed when this csv gets converted back to a java object.
I want to filter out some lines from this file based on some field in the csv. The following statement doesnt work.

 awk -F "," '($2== "Bonito") {print}' filename.csv  

I am guessing that this has something to do with the " appearing in the text.
I saw an example like:

awk -F "\"*,\"*"  

I am not sure how this works. It looks like a regex, but the use of the last * flummoxed me.

Is there a better option than the last awk statement I wrote? How does it work?

bobby
  • 2,629
  • 5
  • 30
  • 56
  • Possible duplicate of [How to escape backslash and double quotes with awk](https://stackoverflow.com/questions/39119155/how-to-escape-backslash-and-double-quotes-with-awk) – Sundeep Mar 08 '18 at 10:43

2 Answers2

2

Since some parameters have double quotes and other not, you can filter with a quoted parameter:

awk -F, '$2 == "\"Bonito\""' filename.csv

To filter on parameter that do not have double quote, just do:

awk -F, '$3 == 7' filename.csv

Another way is to use the double quote in the regex (the command ? that make the double quote optional):

 awk -F '"?,"?' '$2 == "Bonito"' filename.csv

But this has a drawback of also matching the following line:

"Joga",Bonito",7,"Machine1","Admin"
oliv
  • 12,690
  • 25
  • 45
1

First a bit more through test file:

$ cat file
"Joga","Bonito",7,"Machine1","Admin"
"Joga",Bonito,7,"Machine1","Admin"

Using regex ^\"? ie. starts with or without a double quote:

$ awk -F, '$2~/^\"?Bonito\"?$/' file
"Joga","Bonito",7,"Machine1","Admin"
"Joga",Bonito,7,"Machine1","Admin"
James Brown
  • 36,089
  • 7
  • 43
  • 59