0

When working with .csv files, there is typically quotes surrounding cells that contain a ',' and sometimes even all the cells have quotes. I'm trying to isolate the cells that have quotes. Take this code for example:

import re

example_row = 'Value1,"If you study, you will get an "A".","If you do not study, you will fail.",Value4'

quote_pattern = re.compile(r'^".*",|,".*",|,".*"$', re.DOTALL)

print(quote_pattern.findall(example_row))

The output for this is:

[',"If you study, you will make an "A"","If you do not study, you will get an "F"",']

My desired output is this:

[',"If you study, you will make an "A"",', ',"If you do not study, you will get an "F"",']

How do I change the regular expression to recognize this? The intent here is to not split up .csv files using regex; rather, it is to address the issue of regular expressions when you have a case within a case.

Gabe Morris
  • 804
  • 4
  • 21
  • Use csv parser module in python – anubhava Nov 16 '20 at 16:37
  • 1
    @anubhava This is more of a learning experience. I like to code as much as possible with using the least amount of modules. – Gabe Morris Nov 16 '20 at 16:39
  • 1
    Have you seen https://stackoverflow.com/questions/18144431/regex-to-split-a-csv – Boris Verkhovskiy Nov 16 '20 at 18:08
  • 1
    @Boris That is not my question. The purpose for this isn't to just split a .csv file up. The purpose was to determine the regex that could match two different cases within a hole case. I don't think it's fair that someone decided to delete this question because the one that it said it was related to is not what I was asking. I could've provided a completely different example not relating to a .csv file. – Gabe Morris Nov 17 '20 at 16:21
  • 1
    @GabeMorris I didn't vote for closing this question, I was just trying to be helpful by pointing out something that looked like what was ultimately your problem, I've voted to re-open it. But are you sure it would be wrong to say that the answer to that question also contains an answer to yours? – Boris Verkhovskiy Nov 17 '20 at 18:49
  • @GabeMorris: On your suggestion I voted to reopen because linked question doesn't address your question. – anubhava Nov 18 '20 at 09:06
  • 1
    @Boris Thanks for your response and concern! The question you sent me was not very helpful because it's harder for me to interpret the answers because it's not python. Thanks for reopening it for me. anubhava Answered my question by giving me the idea of grouping the regex. – Gabe Morris Nov 18 '20 at 17:22

1 Answers1

1

For your simple case you may use this regex in python:

>>> import re
>>> row = 'Value1,"If you study, you will get an "A".","If you do not study, you will get an "F"",Value4'
>>> print( re.findall(r'(?:^|,)"(.*?)"(?=,|$)', row) )
['If you study, you will get an "A".', 'If you do not study, you will get an "F"']

RegEx Details:

  • (?:^|,): Match start or a ,
  • ": Match opening "
  • (.*?): Match and group 0 or more characters (lazy quantifier)
  • ": Match closing "
  • (?=,|$): Lookahead to assert that we have a , or line end ahead

But as I commented above that prefer using a CSV parser module.

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    Thank you. I just need to learn more about RE's. I'm sure the csv module will be overall more efficient in the end, but there's not as much to learn by using it. – Gabe Morris Nov 16 '20 at 16:59