How to write regular expression to get everything within a quote

Question

Does anyone know how to a regular expression in python to get everything in between a quotation marks?

For example, text: "some text here".... text: "more text in here!"... text:"and some numbers - 2343- here too"

The text are of different length, and some contain punctuation and numbers as well. How do I write a regular expression to extract all the information?

what I would like to see in the compiler:

some text here more text in here and some numbers - 2343 - here too

score 7 · Answer 1 · answered Feb 26 '12 at 16:22

This should work for you:

"(.*?)"

Placing a ? after the * will restrict it to match as little as possible, so it doesn't consume any quote marks.

>>> r = '"(.*?)"'
>>> s =  'text: "some text here".... text: "more text in here!"... text:"and some numbers - 2343- here too"'
>>> import re
>>> re.findall(r, s)
['some text here', 'more text in here!', 'and some numbers - 2343- here too']

score 7 · Answer 2 · edited Feb 26 '12 at 17:14

7

Try "[^"]*" that is, " followed by zero or more items that aren't ", followe by ". So:

pat = re.compile(r'"[^"]*"').

edited Feb 26 '12 at 17:14

Qtax

33,241
9
83
121

answered Feb 26 '12 at 16:22

Pierce

564
2
8

I like Karl's answer better than mine. Thanks, Karl – Pierce Feb 26 '12 at 16:25

score 1 · Answer 3 · edited May 23 '17 at 12:20

If the quoted sub-strings to be matched do NOT contain escaped characters, then both Karl Barker's and Pierce's answers will both match correctly. However, of the two, Pierce's expression is more efficient:

reobj = re.compile(r"""
    # Match double quoted substring (no escaped chars).
    "                   # Match opening quote.
    (                   # $1: Quoted substring contents.
      [^"]*             # Zero or more non-".
    )                   # End $1: Quoted substring contents.
    "                   # Match closing quote.
    """, re.VERBOSE)

But if the quoted sub-string to be matched DOES contain escaped characters, (e.g. "She said: \"Hi\" to me.\n"), then you'll need a different expression:

reobj = re.compile(r"""
    # Match double quoted substring (allow escaped chars).
    "                   # Match opening quote.
    (                   # $1: Quoted substring contents.
      [^"\\]*           # {normal} Zero or more non-", non-\.
      (?:               # Begin {(special normal*)*} construct.
        \\.             # {special} Escaped anything.
        [^"\\]*         # more {normal} Zero or more non-", non-\.
      )*                # End {(special normal*)*} construct.
    )                   # End $1: Quoted substring contents.
    "                   # Match closing quote.
    """, re.DOTALL | re.VERBOSE)

There are several expressions I'm aware of that will do the trick, but the one above (taken from MRE3) is the most efficient of the bunch. See my answer to a similar question where these various, functionally identical expressions are compared.

How to write regular expression to get everything within a quote

3 Answers3