15

I'm reading a response from a source which is an journal or an essay and I have the html response as a string like:

According to some, dreams express "profound aspects of personality" (Foulkes 184), though others disagree.

My goal is just to extract all of the quotes out of the given string and save each of them into a list. My approach was:

[match.start() for m in re.Matches(inputString, "\"([^\"]*)\""))]

Somehow it didn't work for me. Any helps on my regex here? Thanks a lot.

user2864740
  • 60,010
  • 15
  • 145
  • 220
Kiddo
  • 1,910
  • 8
  • 30
  • 54

2 Answers2

34

Provided there are no nested quotes:

re.findall(r'"([^"]*)"', inputString)

Demo:

>>> import re
>>> inputString = 'According to some, dreams express "profound aspects of personality" (Foulkes 184), though others disagree.'
>>> re.findall(r'"([^"]*)"', inputString)
['profound aspects of personality']
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • thanks, that works for me. I have an extra question that supposed the student's essay requires to have a semi colon before anyquote, like--- dreams express : "profound aspects of..." , how can i just add only the substring in double quotes followed by a semi colon? – Kiddo Mar 29 '14 at 19:01
  • 1
    You mean you want to match `:"(text to extract)"` only? Then add `:\s*` before the first `"` character in the regular expression. – Martijn Pieters Mar 29 '14 at 19:02
  • re.findall(r':\s*"([^"]*)"', inputString), can you please explain why we need the * there? – Kiddo Mar 29 '14 at 19:04
  • @Kiddo: to match 0 or more spaces. Flexibility. – Martijn Pieters Mar 29 '14 at 19:06
5

Use this one if your input can have something like this: some "text \" and text" more

s = '''According to some, dreams express "profound aspects of personality" (Foulkes 184), though others disagree.'''
lst = re.findall(r'"(.*?)(?<!\\)"', s)
print lst

Using (?<!\\) negative lookbehind it is checking there is no \ before the "

Sabuj Hassan
  • 38,281
  • 14
  • 75
  • 85