RegEx: Grabbing values between quotation marks

Question

I have a value like this:

"Foo Bar" "Another Value" something else

What regex will return the values enclosed in the quotation marks (e.g. Foo Bar and Another Value)?

Related to http://stackoverflow.com/questions/138552/can-regex-be-used-for-this-particular-string-manipulation — Andrew Edgecombe, Oct 05 '08 at 09:56

score 508 · Answer 1 · edited Nov 19 '12 at 21:14

508

In general, the following regular expression fragment is what you are looking for:

"(.*?)"

This uses the non-greedy *? operator to capture everything up to but not including the next double quote. Then, you use a language-specific mechanism to extract the matched text.

In Python, you could do:

>>> import re
>>> string = '"Foo Bar" "Another Value"'
>>> print re.findall(r'"(.*?)"', string)
['Foo Bar', 'Another Value']

edited Nov 19 '12 at 21:14

Rodrigo Deodoro

1,371
1
9
17

answered Oct 05 '08 at 04:24

Greg Hewgill

951,095
183
1,149
1,285

17

This is great, however it does not handle strings with escaped quotes. e.g., `"hello \" world"` – robbyt Feb 05 '15 at 20:01
1

Using JavaScript's match, this will match the quotation marks as well. It will work with iterating over exec as described here: http://stackoverflow.com/questions/7998180/regex-how-to-extract-text-from-between-quotes-and-exclude-quotes – Kiechlus Apr 27 '16 at 12:22
8

@robbyt I know it's a bit late for a reply but, what about a negative lookbehind? `"(.*?(?<!\\))"` – Mateus Jul 07 '17 at 18:39
6

Thank you - this is simpler if you are sure there are no escaped quotes to deal with. – squarecandy Dec 02 '17 at 19:17
1

Simple and effective! – justdan23 Sep 24 '20 at 22:23
1

what does the .*? do – Golden Lion Jul 21 '21 at 16:12
1

It seems to get everything between and including quotes. – ScottyBlades Jan 15 '22 at 02:58

score 488 · Accepted Answer · edited May 23 '17 at 12:10

488

I've been using the following with great success:

(["'])(?:(?=(\\?))\2.)*?\1

It supports nested quotes as well.

For those who want a deeper explanation of how this works, here's an explanation from user ephemient:

([""']) match a quote; ((?=(\\?))\2.) if backslash exists, gobble it, and whether or not that happens, match a character; *? match many times (non-greedily, as to not eat the closing quote); \1 match the same quote that was use for opening.

edited May 23 '17 at 12:10

Community

1
1

answered Oct 05 '08 at 04:40

Adam

7,800
2
25
24

10

@steve: this would also match, incorrectly, `"foo\"`. The look ahead trick makes the `?` quantifier possessive (even if the regex flavor doesn't support the `?+` syntax or atomic grouping) – Robin Sep 11 '14 at 13:33
1

(["']).*?\1 another, more simplified version – Jeff Voss Mar 17 '15 at 16:56
2

With python this raises an error: sre_constants.error: cannot refer to open group – a1an Jun 12 '15 at 10:43
4

A version using named variables: `((?["'])(?(?:(?=(?\\?))(?P=escapedContent).)*?)(?P=openingQuote))` – Gajus Feb 22 '16 at 12:46
1

I could not understand the purpose of the positive lookahead in this example. Too many backslashes makes it difficult. Here is an example using letters that makes it more clear. https://regex101.com/r/mG8pT3/2 – Gajus Feb 22 '16 at 12:58
16

This returns the values including the matching quotes. Is there no chance to return only the **content between** the quotes, as it was requested? – Martin Schneider Sep 13 '16 at 11:19
If you're looking for content extraction, this won't capture ”” (char code 8220 and 8221) pair. – massanishi Mar 25 '18 at 09:56
21

Abusing a lookahead as a possessive quantifier is completely unnecessary and confusing. Just use an alternation: `(["'])(?:\\.|[^\\])*?\1` – Aran-Fey May 23 '18 at 16:13
2

how to avoid empty strings? – Vikas Bansal Jan 16 '19 at 13:00
what if I needed only images in a certain host? So for images that have in their path /_api/myhost/anotherpath/ – Nikhil Dec 17 '19 at 15:50
1

The `[""']` in the quote from ephemient comment is incorrect, it has one `"` more than your initial regex. – Jan 03 '20 at 19:59
Great post, but I pity the developers who will blindly copy paste this without understanding what it does. – mcalcote Jan 09 '20 at 21:55
10

A modified version of this that only matches the content between the quotes excluding the quotes themselves: `(?<=(["']))(?:(?=(\\?))\2.)*?(?=\1)` – shreyasm-dev Aug 06 '20 at 15:50
thank you @GalaxyCat105, that was _exactly_ what I was looking for... – jmaragon Feb 26 '21 at 12:12
This is going to select the quotation marks too and I doubt if that's the desired effect. I'm using javascript. – PhillipMwaniki Mar 29 '21 at 08:34
I was having an issue implementing this into a .sh command, I ended up trying Martin York answer (below) and it worked for me first try. – Frederick Haug Jul 30 '21 at 16:43
I am trying to find all character proceeding a look ahead and the look ahead pattern=".+?(?=\'.*\')" where the delimiter is ' characters ' – Golden Lion Dec 02 '21 at 18:48
2

For Python folks here: `re.compile(r'(["\'])((?:\\.|[^\\])*?)(\1)').findall(data)`. Added couple of brackets for it to work. Take the second item from each tuple. – Andrey Pokhilko Jan 11 '22 at 08:18

score 129 · Answer 3 · answered Oct 05 '08 at 04:34

129

I would go for:

"([^"]*)"

The [^"] is regex for any character except '"'
The reason I use this over the non greedy many operator is that I have to keep looking that up just to make sure I get it correct.

answered Oct 05 '08 at 04:34

Martin York

257,169
86
333
562

3

This also behaves well among different regex interpretations. – Phil Bennett Oct 05 '08 at 14:33
8

This has saved my sanity. In the RegEx implementation of .NET, "(.*?)" does not have the desired effect (it does not act non-greedy), but "([^"]*)" does. – Jens Neubauer Sep 18 '13 at 09:52
Thank you! For the life of me I could not get this trick to work. I knew it but in my head for some reason I needed the `.` as well as in `.*`. Silly me. This should solve the problem I was having though ironically I just noticed in this case I don't even need it. Anyway it'll probably stay in my head this time. – Pryftan Mar 16 '23 at 14:07

Casimir et Hippolyte · Answer 4 · 2016-10-12T17:56:22.943

36

Lets see two efficient ways that deal with escaped quotes. These patterns are not designed to be concise nor aesthetic, but to be efficient.

These ways use the first character discrimination to quickly find quotes in the string without the cost of an alternation. (The idea is to discard quickly characters that are not quotes without to test the two branches of the alternation.)

Content between quotes is described with an unrolled loop (instead of a repeated alternation) to be more efficient too: [^"\\]*(?:\\.[^"\\]*)*

Obviously to deal with strings that haven't balanced quotes, you can use possessive quantifiers instead: [^"\\]*+(?:\\.[^"\\]*)*+ or a workaround to emulate them, to prevent too much backtracking. You can choose too that a quoted part can be an opening quote until the next (non-escaped) quote or the end of the string. In this case there is no need to use possessive quantifiers, you only need to make the last quote optional.

Notice: sometimes quotes are not escaped with a backslash but by repeating the quote. In this case the content subpattern looks like this: [^"]*(?:""[^"]*)*

The patterns avoid the use of a capture group and a backreference (I mean something like (["']).....\1) and use a simple alternation but with ["'] at the beginning, in factor.

Perl like:

["'](?:(?<=")[^"\\]*(?s:\\.[^"\\]*)*"|(?<=')[^'\\]*(?s:\\.[^'\\]*)*')

(note that (?s:...) is a syntactic sugar to switch on the dotall/singleline mode inside the non-capturing group. If this syntax is not supported you can easily switch this mode on for all the pattern or replace the dot with [\s\S])

(The way this pattern is written is totally "hand-driven" and doesn't take account of eventual engine internal optimizations)

ECMA script:

(?=["'])(?:"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]*(?:\\[\s\S][^'\\]*)*')

POSIX extended:

"[^"\\]*(\\(.|\n)[^"\\]*)*"|'[^'\\]*(\\(.|\n)[^'\\]*)*'

or simply:

"([^"\\]|\\.|\\\n)*"|'([^'\\]|\\.|\\\n)*'

edited Oct 12 '16 at 17:56

answered Apr 05 '15 at 00:13

Casimir et Hippolyte

88,009
5
94
125

1

Python accepts the ECMA script with raw string format, i.e. r""" ECMA script """ – a1an Jun 12 '15 at 11:00
1

This is brilliant, it was very easy to adapt your ECMA one to work with escaping new line and carriage returns inside double quotes. – Douglas Gaskell Apr 16 '16 at 02:27
@douglasg14b: Thanks. Note that if you want to use it in Javascript, you only need to use the literal notation `/pattern/` without escaping anything (instead of the object notation `new RegExp("(?=[\"'])(?:\"[^\"\\\\]*...");`) – Casimir et Hippolyte Apr 17 '16 at 17:05
@a1an: yes, but you can use the Perl version if you remove the `s` here: `(?s:` and if you put `(?s)` somewhere in the pattern. – Casimir et Hippolyte Apr 17 '16 at 17:07
@Gravis: I don't understand your comment. Perhaps you could post somewhere an example string to illustrate your problem. – Casimir et Hippolyte Apr 09 '23 at 22:32
@CasimiretHippolyte My ill-formatted comment was made in error. However, C++ regex does not accept the `\n` character and will throw an exception at runtime. `R"("([^"\\]|\\.)*"|'([^'\\]|\\.)*')"` is an acceptable string literal. – Gravis Apr 11 '23 at 04:06
While I cannot comment on the _efficiency_ of your solution, my concern is that the OP wanted just the content _inside_ the quotes, while — at least in regex101.com — the group matches the content _with_ the quotes. – Gwyneth Llewelyn Jul 04 '23 at 21:07
1

@GwynethLlewelyn: indeed all the patterns match the quotes, but it's really a false problem since it's easy to remove the first and last character of a string (or to capture the parts you are interested by with capture groups). The main interest here is how to deal with eventual escaped quotes and how to avoid to reach the backtracking or steps limits using *unrolled loops* (ex: `(?:[^\\"]|\\.)*` => `[^\\"]*(?:\\.[^\\"]*)*`). – Casimir et Hippolyte Jul 05 '23 at 07:07
1

@GwynethLlewelyn: also, note that whatever the pattern you write, you can't unescape escaped quotes in the result with it. So extracting exactly the content between brackets with one regex (without further processing) is just a sweet dream. – Casimir et Hippolyte Jul 05 '23 at 07:13
@CasimiretHippolyte — indeed, I have to agree with you. Your solution is probably the only one (so far) that is both _efficient_ and _consistent_ & _complete_ (in the sense that it matches everything it's supposed to match, but _only_ matches what is required and nothing else). Most importantly, your ECMA variant is possibly the only non-trivial solution that works with Google's RE2 engine (their own optimised regexp library), which is used, among others, in the Go programming language. RE2 forbids using a capture group and a backreference — which you neatly avoid with your solution. Thanks :) – Gwyneth Llewelyn Jul 05 '23 at 08:01
1

@GwynethLlewelyn: Since RE2 isn't a backtracking engine, you can simply write `"(?s:[^\\"]|\\.)*"|'(?s:[^\\']|\\.)*'`. – Casimir et Hippolyte Jul 05 '23 at 11:38

IrishDubGuy · Answer 5 · 2017-11-10T01:29:56.707

Peculiarly, none of these answers produce a regex where the returned match is the text inside the quotes, which is what is asked for. MA-Madden tries but only gets the inside match as a captured group rather than the whole match. One way to actually do it would be :

(?<=(["']\b))(?:(?=(\\?))\2.)*?(?=\1)

Examples for this can be seen in this demo https://regex101.com/r/Hbj8aP/1

The key here is the the positive lookbehind at the start (the ?<= ) and the positive lookahead at the end (the ?=). The lookbehind is looking behind the current character to check for a quote, if found then start from there and then the lookahead is checking the character ahead for a quote and if found stop on that character. The lookbehind group (the ["']) is wrapped in brackets to create a group for whichever quote was found at the start, this is then used at the end lookahead (?=\1) to make sure it only stops when it finds the corresponding quote.

The only other complication is that because the lookahead doesn't actually consume the end quote, it will be found again by the starting lookbehind which causes text between ending and starting quotes on the same line to be matched. Putting a word boundary on the opening quote (["']\b) helps with this, though ideally I'd like to move past the lookahead but I don't think that is possible. The bit allowing escaped characters in the middle I've taken directly from Adam's answer.

Error on space after quote , ex https://regex101.com/r/ohlchh/1 — Wagner Pereira, Mar 01 '21 at 21:47
It is the \b word boundary that is causing that issue Wagner, it is only needed if you are trying to match more than one string per line. If you have both more than one string per line and strings that start with a space then you will another solution. — IrishDubGuy, Mar 08 '21 at 19:40
The `\b` will also cause problems when the first character in the string is a dot. @IrishDubGuy even when using `\b` it seems to find text between ending and starting quote, this solution doesn't seem to work properly. Couldn't you just decide to consume the end quote, so that it won't get false positives between the strings? — Martin Braun, Aug 14 '23 at 21:03

Martin Schneider · Answer 6 · 2016-09-14T09:26:58.700

26

The RegEx of accepted answer returns the values including their sourrounding quotation marks: "Foo Bar" and "Another Value" as matches.

Here are RegEx which return only the values between quotation marks (as the questioner was asking for):

Double quotes only (use value of capture group #1):

"(.*?[^\\])"

Single quotes only (use value of capture group #1):

'(.*?[^\\])'

Both (use value of capture group #2):

(["'])(.*?[^\\])\1

-

All support escaped and nested quotes.

edited Sep 14 '16 at 09:26

answered Sep 14 '16 at 09:15

Martin Schneider

14,263
7
55
58

Please, why this works? I was using `src="(.*)"` but obviously it was selecting everything before the last ", your REGEX, though, selected only the src="" contents, but I didn't understand how? – Lucas Bustamante Jul 25 '18 at 23:25
2

I like this one a lot for it's simplicity but it doesn't handle empty or no value between quotes very well as I discovered – RedactedProfile Feb 13 '19 at 01:24
Bless you my friend. I used this to delete all values from a big ol JSON object: `: "(.*?[^\\])"` – tshoemake Jul 29 '20 at 03:45
For anyone using Javascript's `string.match()` you want the result at index 1 not 0! – alex87 Nov 13 '20 at 07:27

wp78de · Answer 7 · 2020-05-19T18:41:05.473

19

I liked Eugen Mihailescu's solution to match the content between quotes whilst allowing to escape quotes. However, I discovered some problems with escaping and came up with the following regex to fix them:

(['"])(?:(?!\1|\\).|\\.)*\1

It does the trick and is still pretty simple and easy to maintain.

Demo (with some more test-cases; feel free to use it and expand on it).

_{PS: If you just want the content between quotes in the full match ($0), and are not afraid of the performance penalty use:}

(?<=(['"])\b)(?:(?!\1|\\).|\\.)*(?=\1)

_{Unfortunately, without the quotes as anchors, I had to add a boundary \b which does not play well with spaces and non-word boundary characters after the starting quote.}

_{Alternatively, modify the initial version by simply adding a group and extract the string form $2:}

(['"])((?:(?!\1|\\).|\\.)*)\1

_{PPS: If your focus is solely on efficiency, go with Casimir et Hippolyte's solution; it's a good one.}

edited May 19 '20 at 18:41

answered May 13 '18 at 21:36

wp78de

18,207
7
43
71

observation: the second regex misses a value with a minus sign `-`, like in longitude coordinates. – Crowcoder May 17 '20 at 17:40
I didn't change anything. If you don't observe the issue maybe it's the flavor of regex I'm using. I was using the regex101site, I think php style regex. – Crowcoder May 18 '20 at 09:16
[Here is the demo of what I'm talking about.](https://regex101.com/r/bTU9c6/1) I was expecting it to match the longitude (-96.74025) but it doesn't. – Crowcoder May 19 '20 at 11:37
@Crowcoder Thank you. Yes, this is caused by the word boundary that acts as an anchor and helps to avoid overlapping matches but doesn't play nice with your input. An additional group is actually the better option as noted in the updated answer. – wp78de May 19 '20 at 18:46
Trying to figure out how to join this solution with an existing regex [here](https://stackoverflow.com/questions/65730393/joining-two-regex-expressions-for-nested-quotes-support). Any suggestion? – vitaly-t Jan 15 '21 at 04:14
Heh. Your solution is interesting, especially for returning the value _inside_ the quotes, which is what the OP was after. However, the edge cases are curious — namely, `""` or `''` do _not_ return the empty string on `$2` (what is actually returned is probably up to the engine; possibly it returns `null` or something like that, which _may_ be acceptable in some cases). Personally, I prefer what you call the "alternative" version (the one that does not use `\b`) to the "original" (the one using `\b`) because the latter tends not to be so messier on edge cases... – Gwyneth Llewelyn Jul 04 '23 at 21:15

score 14 · Answer 8 · answered Oct 29 '14 at 15:18

14

A very late answer, but like to answer

(\"[\w\s]+\")

http://regex101.com/r/cB0kB8/1

answered Oct 29 '14 at 15:18

Suganthan Madhavan Pillai

5,495
9
51
84

Works nicely in php. – Parapluie Feb 02 '18 at 17:32
The only answer so far for capturing both "HomePage" in : localize["Home page"]localize["Home page"] – jBelanger Apr 05 '20 at 16:39
its breaking when expression in the quotation has minus symbol - – Karol Be Feb 25 '22 at 16:17
Mmh. Sorry, but no :) It just works on a few scenarios (e.g. no negative values, no punctuation, no quoted/escaped quotes...); it only works with `"` (and not `'`); and, of course, the captured group will come _with_ the quotes (the OP just wanted what's _between_ the quotes) – Gwyneth Llewelyn Jul 04 '23 at 21:20

score 9 · Answer 9 · answered Dec 10 '15 at 10:08

The pattern (["'])(?:(?=(\\?))\2.)*?\1 above does the job but I am concerned of its performances (it's not bad but could be better). Mine below it's ~20% faster.

The pattern "(.*?)" is just incomplete. My advice for everyone reading this is just DON'T USE IT!!!

For instance it cannot capture many strings (if needed I can provide an exhaustive test-case) like the one below:

$string = 'How are you? I\'m fine, thank you';

The rest of them are just as "good" as the one above.

If you really care both about performance and precision then start with the one below:

/(['"])((\\\1|.)*?)\1/gm

In my tests it covered every string I met but if you find something that doesn't work I would gladly update it for you.

Check my pattern in an online regex tester.

I like the simplicity of your pattern, however performance-wise Casimir et Hippolyte's pattern blows all extended solutions out of the water. Furthermore, it looks like your pattern has problems with extended edge-cases like an escaped quote at the end of the sentence. — wp78de, May 13 '18 at 20:53

score 8 · Answer 10 · edited Jan 30 '14 at 01:53

8

This version

accounts for escaped quotes

controls backtracking

/(["'])((?:(?!\1)[^\\]|(?:\\\\)*\\[^\\])*)\1/

edited Jan 30 '14 at 01:53

HamZa

14,671
11
54
75

answered Oct 06 '08 at 01:42

Axeman

29,660
2
47
102

This spans multiple strings and doesn't seem to handle a double backslash correctly, for example the string: **foo 'stri\\ng 1' bar 'string 2' and 'string 3'** [Debuggex Demo](https://www.debuggex.com/r/v774K8El8RyM5k7l) – miracle2k Oct 01 '13 at 19:30
You can't use a backreference in a character class. – HamZa Jan 30 '14 at 01:53

James Harrington · Answer 11 · 2016-12-07T17:30:30.710

8

MORE ANSWERS! Here is the solution i used

\"([^\"]*?icon[^\"]*?)\"

TLDR;
replace the word icon with what your looking for in said quotes and voila!

The way this works is it looks for the keyword and doesn't care what else in between the quotes. EG:
id="fb-icon"
id="icon-close"
id="large-icon-close"
the regex looks for a quote mark "
then it looks for any possible group of letters thats not "
until it finds icon
and any possible group of letters that is not "
it then looks for a closing "

edited Dec 07 '16 at 17:30

answered Nov 10 '16 at 03:06

James Harrington

3,138
30
32

1

Thank you very much. was able to replace every occurrence of `name="value"` with `name={"value"}` since this answer's regex returns `icon`/`value` as the second group (unlike the accepted answer). **Find**: `=\"([^\"]*?[^\"]*?)\"` **Replace**: `={"$1"}` – Palisand Sep 20 '17 at 19:24
Mind explaining the downvote? it works well from some situations. – James Harrington Jul 10 '18 at 16:36
Are you replying to me? – Palisand Jul 10 '18 at 21:29
@Palisand no someone down-voted this post the other day with no explanation. – James Harrington Jul 12 '18 at 15:28
this seems to be the only answer that finds an specific text inside quotes – Top-Master Nov 26 '18 at 09:17
Thanks, this is the one that worked for me; all the others were matching across multiple strings, so with the other solutions `a="a" b="b"` would match as `a" b="b` and your solution successfully matches `a` or `b`. – Nick Bolton Apr 30 '22 at 09:49

score 6 · Answer 12 · edited Jan 30 '14 at 01:55

I liked Axeman's more expansive version, but had some trouble with it (it didn't match for example

foo "string \\ string" bar

or

foo "string1"   bar   "string2"

correctly, so I tried to fix it:

# opening quote
(["'])
   (
     # repeat (non-greedy, so we don't span multiple strings)
     (?:
       # anything, except not the opening quote, and not 
       # a backslash, which are handled separately.
       (?!\1)[^\\]
       |
       # consume any double backslash (unnecessary?)
       (?:\\\\)*       
       |
       # Allow backslash to escape characters
       \\.
     )*?
   )
# same character as opening quote
\1

score 5 · Answer 13 · edited Feb 12 '14 at 08:11

5

string = "\" foo bar\" \"loloo\""
print re.findall(r'"(.*?)"',string)

just try this out , works like a charm !!!

\ indicates skip character

edited Feb 12 '14 at 08:11

Alan Moore

73,866
12
100
156

answered Feb 12 '14 at 07:28

mobman

89
1
8

If that first line is the actual Python code, it's going to create the string `" foo bar" "loloo"`. I suspect you meant to wrap that in a raw string like you did with the regex: `r'"\" foo bar\" \"loloo\""'`. Please make use of SO's excellent [formatting capabilities](http://stackoverflow.com/editing-help#code) whenever it's appropriate. It's not just cosmetics; we literally can't tell what you're trying to say if you don't use them. And welcome to [SO]! – Alan Moore Feb 12 '14 at 08:35
thanks for the advice alan, i am actually new to this community, next time i'll surely keep all this in mind...sincere apologies. – mobman Feb 12 '14 at 22:43

novice · Answer 14 · 2022-06-14T12:14:32.537

5

My solution to this is below

(["']).*\1(?![^\s])

Demo link : https://regex101.com/r/jlhQhV/1

Explanation:

(["'])-> Matches to either ' or " and store it in the backreference \1 once the match found

.* -> Greedy approach to continue matching everything zero or more times until it encounters ' or " at end of the string. After encountering such state, regex engine backtrack to previous matching character and here regex is over and will move to next regex.

\1 -> Matches to the character or string that have been matched earlier with the first capture group.

(?![^\s]) -> Negative lookahead to ensure there should not any non space character after the previous match

edited Jun 14 '22 at 12:14

answered Jun 14 '22 at 09:59

novice

394
2
11

1

Kudos for your clear explanation. This will even match things like `"this string has \"backslashed quotes\" in it"` or even `'a perfectly valid string in "PHP"'` while having no problem in rejecting mismatched quotes. However, as many others mentioned, the OP wanted the content _inside_ the quotes, i.e. with the quotes stripped _out_ of the text. Besides that minor issue, your solution clearly works; well done! – Gwyneth Llewelyn Jul 04 '23 at 20:55

lon · Answer 15 · 2018-05-06T03:37:56.447

3

Unlike Adam's answer, I have a simple but worked one:

(["'])(?:\\\1|.)*?\1

And just add parenthesis if you want to get content in quotes like this:

(["'])((?:\\\1|.)*?)\1

Then $1 matches quote char and $2 matches content string.

edited May 06 '18 at 03:37

answered May 06 '18 at 03:32

lon

31
3

Donovan P · Answer 16 · 2020-05-25T04:49:48.117

3

All the answer above are good.... except they DOES NOT support all the unicode characters! at ECMA Script (Javascript)

If you are a Node users, you might want the the modified version of accepted answer that support all unicode characters :

/(?<=((?<=[\s,.:;"']|^)["']))(?:(?=(\\?))\2.)*?(?=\1)/gmu

Try here.

edited May 25 '20 at 04:49

answered May 24 '20 at 12:08

Donovan P

591
5
9

1

What is a non-unicode character? AFAIK unicode covers **all** character. – Toto May 24 '20 at 12:16
1

Why do you guess it's a javascript question? Moreover, lookbehind is not supported in all browsers, regex101 throws `? The preceding token is not quantifiable` – Toto May 24 '20 at 12:20
@Toto, What I mean is "does not support all the unicode character". Thank you. While the question is about regex in general, I just't want to emphasize that the usage of word boundary assertions would cause unwanted behavior in the Javascript. And of course, while Javascripts are generally for browser, there is Node too. – Donovan P May 25 '20 at 05:03

score 2 · Answer 17 · answered Oct 05 '08 at 12:45

echo 'junk "Foo Bar" not empty one "" this "but this" and this neither' | sed 's/[^\"]*\"\([^\"]*\)\"[^\"]*/>\1</g'

This will result in: >Foo Bar<><>but this<

Here I showed the result string between ><'s for clarity, also using the non-greedy version with this sed command we first throw out the junk before and after that ""'s and then replace this with the part between the ""'s and surround this by ><'s.

score 2 · Answer 18 · answered Mar 02 '18 at 16:51

2

If you're trying to find strings that only have a certain suffix, such as dot syntax, you can try this:

\"([^\"]*?[^\"]*?)\".localized

Where .localized is the suffix.

Example:

print("this is something I need to return".localized + "so is this".localized + "but this is not")

It will capture "this is something I need to return".localized and "so is this".localized but not "but this is not".

answered Mar 02 '18 at 16:51

OffensivelyBad

621
8
16

1

More correctly, if you're matching the 'dot', then you should escape it by using `\.`; or else, you'll be able to match `"this is something I need to return"Jlocalized` as well. – Gwyneth Llewelyn Jul 04 '23 at 20:58

score 2 · Answer 19 · answered May 04 '18 at 13:35

A supplementary answer for the subset of Microsoft VBA coders only one uses the library Microsoft VBScript Regular Expressions 5.5 and this gives the following code

Sub TestRegularExpression()

    Dim oRE As VBScript_RegExp_55.RegExp    '* Tools->References: Microsoft VBScript Regular Expressions 5.5
    Set oRE = New VBScript_RegExp_55.RegExp

    oRE.Pattern = """([^""]*)"""


    oRE.Global = True

    Dim sTest As String
    sTest = """Foo Bar"" ""Another Value"" something else"

    Debug.Assert oRE.test(sTest)

    Dim oMatchCol As VBScript_RegExp_55.MatchCollection
    Set oMatchCol = oRE.Execute(sTest)
    Debug.Assert oMatchCol.Count = 2

    Dim oMatch As Match
    For Each oMatch In oMatchCol
        Debug.Print oMatch.SubMatches(0)

    Next oMatch

End Sub

While posting the whole VB code for your solution is great (and helpful for VB users), your pattern is far too simple to capture all possible cases. Consider using one of the most voted regexps on this topic instead! — Gwyneth Llewelyn, Jul 05 '23 at 07:53

score 2 · Answer 20 · answered Nov 29 '11 at 15:59

From Greg H. I was able to create this regex to suit my needs.

I needed to match a specific value that was qualified by being inside quotes. It must be a full match, no partial matching could should trigger a hit

e.g. "test" could not match for "test2".

reg = r"""(['"])(%s)\1"""
if re.search(reg%(needle), haystack, re.IGNORECASE):
    print "winning..."

Hunter

RegEx: Grabbing values between quotation marks

20 Answers20

Linked

Related