7

How to get what is between the quotes in the following two texts ?

text_1 = r""" "Some text on \"two\" lines with a backslash escaped\\" \
     + "Another text on \"three\" lines" """

text_2 = r""" "Some text on \"two\" lines with a backslash escaped\\" + "Another text on \"three\" lines" """

The problem for me is that the quotes should be ignored if they are escaped, but there is the possibilty to have the backslash escaped.

I would like to obtain the following groups.

[
    r'Some text on \"two\" lines with a backslash escaped\\',
    r'Another text on \"three\" lines'
]
  • Sorry I've edited my question because some spurious spaces have been added by the google translator. –  Apr 21 '13 at 10:59
  • You'll need more escapes there. Why the concatenation in the middle? That just distracts from your question – Martijn Pieters Apr 21 '13 at 11:00
  • I've also forgotten the escaped quotes, that has been done. –  Apr 21 '13 at 11:01
  • @MartijnPieters Your're right. Here is now the simpler version of my uqestion. –  Apr 21 '13 at 11:03
  • There is nothing to ignore? I don't see any escaped – jamylak Apr 21 '13 at 11:07
  • Is this a good example? `text = "Some text on \"two\" lines with a backslash escaped\\" \ + "Another text on \"three\" lines \\\"four\\\""` – jamylak Apr 21 '13 at 11:11
  • Sorry, here is a better example. –  Apr 21 '13 at 12:38
  • @projetmbc In your new example, every quote is escape, so does that mean you ignore all of them? Anyway, I updated my answer to produce your result – jamylak Apr 21 '13 at 12:42
  • You're right. I've forgotten the escaped backslash. –  Apr 21 '13 at 12:55
  • @projetmbc I've changed it for the updated example, It's still kinda unclear what you need though, please comment and tell me if it doesn't work – jamylak Apr 21 '13 at 13:10

4 Answers4

26
"(?:\\.|[^"\\])*"

matches a quoted string, including any escaped characters that occur within it.

Explanation:

"       # Match a quote.
(?:     # Either match...
 \\.    # an escaped character
|       # or
 [^"\\] # any character except quote or backslash.
)*      # Repeat any number of times.
"       # Match another quote.
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • This gives me `unexpected end of regular expression` error. Any ideas why? – Saheel Godhane Nov 02 '15 at 22:12
  • 1
    @SaheelGodhane: Most probably because of string processing. In Python, you'd need a raw string in single quotes if you want to compile this regex: `re.compile(r'"(?:\\.|[^"\\])*"')`. – Tim Pietzcker Nov 03 '15 at 10:11
1
>>> import re
>>> text = "Some text on\n\"two\"lines" + "Another texton\n\"three\"\nlines"
>>> re.findall(r'"(.*)"', text)
["two", "three"]
Pit
  • 3,606
  • 1
  • 25
  • 34
  • Sorry, I've forgotten some escaped quotes in my question. This has been updated. –  Apr 21 '13 at 11:04
  • That doesn't matter, as far as I can tell. EDIT: Well, it does. Let me look into it. – Pit Apr 21 '13 at 11:05
  • 3
    `.*` will consume all symbols including `"`, so if it wasn't for a new line `\n`, it would output `"two\"linesAnother texton\"three\"` – ovgolovin Apr 21 '13 at 11:05
  • @projetmbc Glad to hear that, still perreal gave you the correct answer. Be sure to accept it, if it fits your needs! – Pit Apr 21 '13 at 11:09
  • @Pit Inded there is a porbleme if something like `"..." + "..."` is used. –  Apr 21 '13 at 13:00
1

Match everything but a double quote:

import re
text = "Some text on \"two\" lines" + "Another text on \"three\" lines"
print re.findall(r'"([^"]*)"', text)

Output

['two', 'three']
perreal
  • 94,503
  • 21
  • 155
  • 181
0
>>> import re
>>> text_1 = r""" "Some text on \"two\" lines with a backslash escaped\\" \
     + "Another text on \"three\" lines" """
>>> text_2 = r""" "Some text on \"two\" lines with a backslash escaped\\" + "Another text on \"three\" lines" """
>>> re.findall(r'\\"([^"]+)\\"', text_2)
['two', 'three']
>>> re.findall(r'\\"([^"]+)\\"', text_1)
['two', 'three']

Perhaps you want this:

re.findall(r'\\"((?:(?<!\\)[^"])+)\\"', text)
jamylak
  • 128,818
  • 30
  • 231
  • 230
  • Sorry for the poorness of my english because it is ont my native language. So I would "simply" like to catch Python strings for highlighting them and for other stufs. –  Apr 21 '13 at 13:16
  • @projetmbc That's fine, you could provide an example this doesn't work for? – jamylak Apr 21 '13 at 13:17
  • I've added the groups I would like to obtain. –  Apr 21 '13 at 13:25
  • @projetmbc alright well it's good that someone understands this! – jamylak Apr 21 '13 at 13:47
  • No so easy to be understandable with another language that his native one. Sorry. –  Apr 21 '13 at 13:48
  • @projetmbc Oh what I meant was I tried his solution and it produced different results to what you asked for. Anyway let's forget about this now! – jamylak Apr 21 '13 at 13:48