1

Use Case

I want to use regex to grab a very small part of json data with unknown location. Although Python has a json library, parsing all json data is slow. The json data has regular format.

Goal

For each occurrence of 1001, I want to grab content in the innermost braces that enclose the occurrence

Code

import re
x = r'{123:{"a":100, "asdf":"example.com","at":1001},'\
    '47289:{"a":20, "asdf":"test.org","at":20},}'
regex = r'{(.*?)1001(.*?)}'
print(re.match(regex, x).group(1))

Desired Result

{"a":100, "asdf":"example.com","at":1001}

Actual Result

123:{"a":100, "asdf":"example.com","at":

Questions

How to do this? How to do this fast?

duh1234
  • 13
  • 2
  • @Pedro Lobito: I was thinking about compile once and then run on many json. This answer https://stackoverflow.com/questions/21808913/using-json-or-regex-when-processing-tweets gave me the idea that regex is faster in some cases. – duh1234 Dec 21 '18 at 05:32
  • I doubt about https://stackoverflow.com/questions/21808913/using-json-or-regex-when-processing-tweets even if it's the accepted answer. You may want to profile both options to see which one's faster. – Pedro Lobito Dec 21 '18 at 05:35

1 Answers1

1

Don't use .*?, it will match any characters, including {. Use [^{]*? instead.

You also need to use re.search(), not re.match(), since match() only matches at the beginning of the string. See What is the difference between re.search and re.match?.

And to get the entire match, use .group(0). .group(1) just returns the part that matches [^}]*?.

import re
x = r'{123:{"a":100, "asdf":"example.com","at":1001},'\
    '47289:{"a":20, "asdf":"test.org","at":20},}'
regex = r'{([^{]*?)1001(.*?)}'
print(re.search(regex, x).group(0))

Making it non-greedy doesn't solve the problem, because matching goes left-to-right. So { will match the first {, then .* will match everything until 1001, which includes the inner {.

It works as expected for the second group because the non-greedy quantifier stops before the first }, since it's working left-to-right.

Barmar
  • 741,623
  • 53
  • 500
  • 612