I have to handle some data that comes in pipe delimited files, where each field is enclosed in double quotes.
"Boolean"|"dada -sdf|xcvnb"|"123"
If I take FS="|"
, then the script takes the above as four fields, whereas this is actually three fields. If I take FS="\"|\""
then I have two issues:
- I have to deal with 1st and the last field that becomes,
"Boolean
and123"
separately - And more importantly, Now since we don’t have double quotes anymore, when I take each field and process, some functions or commands may not take the whole string that is there in the field(since they may be separated by spaces and different other characters).e.g. the 2nd field becomes
dada -sdf|xcvnb
i.e. without quotes, which for some commands may give erroneous results as-
may be interpreted as options, or only the 1st word may be taken as argument and rest of the string after space is not taken into consideration at all.
My thought - I want to tell gawk that take FS as |
only if it is followed by a "
and preceded by a "
. That way I don't strip off the double quotes from the fields.
How can I write the code? Is there a way?