0

I'm working with some comma delimited text files. The file is comprised of approximately 400 rows and 94 columns all comma delimited and withing double quotes:

"H","9","YES","NO"....

My aim is to split the file up to its respective columns using the comma delimiter. Unfortunately, there are several fields within the rows that have the following format:

"4,5"  or "2,5,8"

These fields are corrupting the column structure of the file when parse the file on the comma. So what I'd like to do is use regular expression to to do some sort of find and replace so that I can successfully parse my file. For example:

 "H","9","YES","NO","4,5","Y","N"  would become this:


"H","9","YES","NO","4|5","Y","N"

so that when I parse the file I would get seven columns instead of eight.

I wrote a regular expression that handles matching "2,5" or "2,3,4", but I'm not sure how to handle the replacing part.

Is it possible to accomplish this regular expressions?

Note: I'm using perl regular expressions.

Jeff Y
  • 2,437
  • 1
  • 11
  • 18
Mutuelinvestor
  • 3,384
  • 10
  • 44
  • 75
  • Not sure that this change is necessary, If you use a csv parser to read your lines, there is no reason for quoted commas to be seen as field delimiters. – Casimir et Hippolyte Oct 10 '15 at 13:57

2 Answers2

1

Use a look around

(?<!"),(?!")

replacing it with a pipe.

which means

(?<!")    - character before is not a "
,         - match a comma
(?!")     - character after is not a "
ergonaut
  • 6,929
  • 1
  • 17
  • 47
1

Rather than interfere with what is evidently source data, i.e. the stuff inside the quotes, you might consider replacing the field-separator commas instead:

s/,([^,"]*|"[^"]*")(?=(,|$))/|$1/g

Note that this also handles non-quoted fields.

On this data: "H",9,"YES","NO","4,5","Y","N"

$ perl -pe 's/,([^,"]*|"[^"]*")(?=(,|$))/|$1/g' commasep
"H"|9|"YES"|"NO"|"4,5"|"Y"|"N"

Which can afterwards be split on "|":

$ perl -ne 's/,([^,"]*|"[^"]*")(?=(,|$))/|$1/g;print join "---",split "\\|"' commasep
"H"---9---"YES"---"NO"---"4,5"---"Y"---"N"
Jeff Y
  • 2,437
  • 1
  • 11
  • 18