1

Input:

blalasdl8ujd "key":"value", blblabla asdw "alo":"ebobo",blabla"www":"zzzz"

or

blalasdl8ujd key [any_chars_here] "value", blabla asdw "alo":"ebobo", bla"www":"zzzz"

I'm tring to extract value having only key and knowing that value is covered with "

Following regex key.*"(.*?)" returns the last match covered with " ("zzzz"). I need to fix it to return first.

https://regex101.com/r/CDfhBT/1

PATAPOsha
  • 372
  • 3
  • 18

4 Answers4

3

Code

See regex in use here

"key"\s*:\s*"([^"]*)"

To match the possibility of escaped double quotes you can use the following regex:

See regex in use here

"key"\s*:\s*"((?:(?<!\\)\\(?:\\{2})*"|[^"])*)"

This method ensures that an odd number of backslashes \ precedes the double quotation character " such that \", \\\", \\\\\", etc. are valid, but \\", \\\\", \\\\\\" are not valid (this would simply output a backslash character, thus the double quotation character " preceded by an even number of backslashes would simply result in a string termination).

Matching both strings

If you're looking to match your second string as well, you can use either of the following regexes:

\bkey\b(?:"\s*:\s*|.*?)"([^"]*)"
\bkey\b(?:"\s*:\s*|.*?)"((?:(?<!\\)\\(?:\\{2})*"|[^"])*)"

Usage

See code in use here

import re

s = 'blahblah "key":"value","TargetCRS": "Target","TargetCRScode": "vertical Code","zzz": "aaaa" sadzxc "sss"'
r = re.compile(r'''"key"\s*:\s*"([^"]*)"''')

match = r.search(s)
if match:
    print match.group(1)

Results

Input

blahblah "key":"value","TargetCRS": "Target","TargetCRScode": "vertical Code","zzz": "aaaa" sadzxc "sss"
blalasdl8ujd key [any_chars_here] "value", blabla asdw "alo":"ebobo", bla"www":"zzzz"

Output

String 1

  • Match: "key":"value"
  • Capture group 1: value

String 2 (when using one of the methods under Matching both strings)

  • Match: key [any_chars_here] "value"
  • Capture group 1: value

Explanation

  • "key" Match this literally
  • \s* Match any number of whitespace characters
  • : Match the colon character literally
  • \s* Match any number of whitespace characters
  • " Match the double quotation character literally
  • ([^"]*) Capture any character not present in the set (any character except the double quotation character ") any number of times into capture group 1
  • " Match the double quotation character literally

Matching both strings

  • \b Assert position as a word boundary
  • key Match this literally
  • \b Assert position as a word boundary
  • (?:"\s*:\s*|.*?) Match either of the following
    • "\s*:\s*
      • " Match this literally
      • \s* Match any number of whitespace characters
      • : Match this literally
      • \s* Match any number of whitespace characters
    • .*? Match any character any number of times, but as few as possible
  • " Match this literally
  • ([^"]*) Capture any number of any character except " into capture group 1
  • " Match this literally
ctwheels
  • 21,901
  • 9
  • 42
  • 77
  • The double quotes inside a string defined with double quotes isn't working for you... use single quotes to define to regex and then you can use double quotes inside. – Joe Iddon Dec 08 '17 at 18:05
  • @JoeIddon what? This outputs what the OP is asking for no? – ctwheels Dec 08 '17 at 18:06
  • You can't define a string in python that contains double quotes `"` with double quotes. So `s = "hello "bob"."` is in valid – Joe Iddon Dec 08 '17 at 18:17
  • @JoeIddon I know that, but my answer is in regex, the OP has to escape the double quotes or click the "Code Generator" button on regex101. – ctwheels Dec 08 '17 at 18:19
  • @JoeIddon I added usage above. – ctwheels Dec 08 '17 at 18:26
  • Actually the purpose was to make regex that will match value from both of inputs. Anyway, thanks for your explanations. – PATAPOsha Dec 11 '17 at 14:11
  • @PATAPOsha at the end of my **Code** section I've added a **Matching both strings** section, which contains two regular expressions that match both the first and second strings. I also added an explanation for one such method in the **Explanation** section. These methods existed in my answer before, I simply made it more apparent and added the explanation. P.S. the chosen answer does work, but fails against escaped double quotes `"` and also matches `key` when it could be part of another word. For example, change `key` to `rekey`. You'll see it matches, whereas mine does not. – ctwheels Dec 11 '17 at 14:45
1

You can use the non-greedy quantifier .*? between the key and the value group:

key.*?"(.*?)"

Demo here.

Update

You might wonder why it captures the colon, :. It captures that because this is the next thing between quotes. So you can add optional quotes around key like this:

("?)key\1.*?"(.*?)"

Another demo here.

Community
  • 1
  • 1
Tamas Rev
  • 7,008
  • 5
  • 32
  • 49
0

Check this:

.*(\"key\":\"(\w*)\")

Using the group 2:

https://regex101.com/r/66ikH3/2

Jorge Omar Medra
  • 978
  • 1
  • 9
  • 19
0

There's probably a somewhat more pythonic way to do this, but:

s1 = 'blalasdl8ujd "key":"value", blblabla asdw "alo":"ebobo",blabla"www":"zzzz"'
s2 = 'blalasdl8ujd key [any_chars_here] "value", blabla asdw "alo":"ebobo", bla"www":"zzzz"'


def getValue(string, keyName = 'key'):
    """Find next quoted value after a key that may or may not be quoted"""
    startKey = string.find(keyName) 
    # if key is quoted, adjust value search range to exclude its closing quote
    endKey = string.find('"',startKey) if string[startKey-1]=='"' else startKey + len(keyName) 
    startValue = string.find('"',endKey+1)+1
    return string[startValue:string.find('"',startValue+1)]

getValue(s1) #'value'
getValue(s2) #'value'

I was inspired by the elegance of this answer, but handling the quoted and unquoted cases makes it more than a 1-liner.

You can use a comprehension such as:

next(y[1][1:-1] for y in [[l for l in x.split(':')] 
     for  x in s2.split(',')] if 'key' in y[0]) # returns 'value' w/o quotes

But that won't handle the case of s2.

C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134