Extract a string between double quotes

Question

I'm reading a response from a source which is an journal or an essay and I have the html response as a string like:

According to some, dreams express "profound aspects of personality" (Foulkes 184), though others disagree.

My goal is just to extract all of the quotes out of the given string and save each of them into a list. My approach was:

[match.start() for m in re.Matches(inputString, "\"([^\"]*)\""))]

Somehow it didn't work for me. Any helps on my regex here? Thanks a lot.

That's not even valid Python (syntax error) and there is no `re.Matches()` function. — Martijn Pieters, Mar 29 '14 at 18:57

score 34 · Accepted Answer · answered Mar 29 '14 at 18:57

34

Provided there are no nested quotes:

re.findall(r'"([^"]*)"', inputString)

Demo:

>>> import re
>>> inputString = 'According to some, dreams express "profound aspects of personality" (Foulkes 184), though others disagree.'
>>> re.findall(r'"([^"]*)"', inputString)
['profound aspects of personality']

answered Mar 29 '14 at 18:57

Martijn Pieters

1,048,767
296
4,058
3,343

thanks, that works for me. I have an extra question that supposed the student's essay requires to have a semi colon before anyquote, like--- dreams express : "profound aspects of..." , how can i just add only the substring in double quotes followed by a semi colon? – Kiddo Mar 29 '14 at 19:01
1

You mean you want to match `:"(text to extract)"` only? Then add `:\s*` before the first `"` character in the regular expression. – Martijn Pieters Mar 29 '14 at 19:02
re.findall(r':\s*"([^"]*)"', inputString), can you please explain why we need the * there? – Kiddo Mar 29 '14 at 19:04
@Kiddo: to match 0 or more spaces. Flexibility. – Martijn Pieters Mar 29 '14 at 19:06

score 5 · Answer 2 · answered Mar 29 '14 at 18:59

Use this one if your input can have something like this: some "text \" and text" more

s = '''According to some, dreams express "profound aspects of personality" (Foulkes 184), though others disagree.'''
lst = re.findall(r'"(.*?)(?<!\\)"', s)
print lst

Using (?<!\\) negative lookbehind it is checking there is no \ before the "

Extract a string between double quotes

2 Answers2

Linked

Related