0

Trying to extract the text between the first open and last close curly brackets from this code:

<script data-json='{"gr":{"template":77234,"body":"compact"},"model":"sedan"}' type="text/plain"></script>

I've tried using [.?]|{.?} but that only matched the following:
{"gr":{"template":77234,"body":"compact"}

I need to get the following: {"gr":{"template":77234,"body":"compact"},"model":"sedan"}

Any suggestions?

badri
  • 438
  • 3
  • 11

2 Answers2

0

Consider using BeautifulSoup instead: select the <script> tag, then extract the data-json attribute:

from bs4 import BeautifulSoup
text = '''
<script data-json='{"gr":{"template":77234,"body":"compact"},"model":"sedan"}' type="text/plain"></script>
'''
soup = BeautifulSoup(text, 'html.parser')
script = soup.select('script[data-json]')
jsonData = script[0]['data-json']
print(jsonData)

Output:

{"gr":{"template":77234,"body":"compact"},"model":"sedan"}
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
0

Try simple

'({.+})'

and then get the first group via .group(1) on the match

MrJuicyBacon
  • 316
  • 2
  • 8