0

I want to use REGEX to extract the STRING in a curly bracket inside a curly bracket, because there are millions of rows and the position of the inside curly bracket is always different in the 'larger' curly bracket.

Here's the string:

{'eventData': {'type': 'page', 'name': 'chicken 2'}, 
'eventId': '1993', 
'deviceType': 'keroppi', 
'pageUrl': '/chicken 2', 
'version': '1.0.0.1999-10_7_2020__4_18_30', 
'sessionGUID': 'f4123f21-31ad-4e83-ba77-41231238', 
'locationid': '0601eba9-1259-4ae6-bad5-7d1231239', 
'eventDescription': 'Page Load'}

It could be like this:

{'eventId': '1993', 
'deviceType': 'keroppi', 
'pageUrl': '/chicken 2', 
'eventData': {'type': 'page', 'name': 'chicken 2'},
'version': '1.0.0.1999-10_7_2020__4_18_30', 
'sessionGUID': 'f4123f21-31ad-4e83-ba77-41231238', 
'locationid': '0601eba9-1259-4ae6-bad5-7d1231239', 
'eventDescription': 'Page Load'}

What I want is just the interior curly bracket: {'type': 'page', 'name': 'chicken 2'}

What is the regular expression to get the string after "'eventData':" and before the ","?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 3
    You need a JSON parser here, not regex. – Tim Biegeleisen Nov 18 '20 at 06:32
  • @Tim Except it isn't valid JSON. More likely just a `str` of a `dict`. – Mark Tolonen Nov 18 '20 at 06:43
  • @TimBiegeleisen The data is not valid JSON (single quotation marks). Looks like a Python `__repr__()`. MAybe [`ast.literal_eval()`](https://docs.python.org/3/library/ast.html#ast.literal_eval) would do the job. – Klaus D. Nov 18 '20 at 06:44
  • The best way to handle this would be to fix the format at the source. If the source of the string is your own program: don't convert it to a string, if the source is external it should be properly encoded as JSON. – Klaus D. Nov 18 '20 at 06:49

1 Answers1

0

Use .get() on the dict:

In [2494]: s = {'eventData': {'type': 'page', 'name': 'chicken 2'},
      ...: 'eventId': '1993',
      ...: 'deviceType': 'keroppi',
      ...: 'pageUrl': '/chicken 2',
      ...: 'version': '1.0.0.1999-10_7_2020__4_18_30',
      ...: 'sessionGUID': 'f4123f21-31ad-4e83-ba77-41231238',
      ...: 'locationid': '0601eba9-1259-4ae6-bad5-7d1231239',
      ...: 'eventDescription': 'Page Load'}

In [2495]: s.get('eventData')
Out[2495]: {'type': 'page', 'name': 'chicken 2'}

In [2496]: s1 = {'eventId': '1993',
      ...: 'deviceType': 'keroppi',
      ...: 'pageUrl': '/chicken 2',
      ...: 'eventData': {'type': 'page', 'name': 'chicken 2'},
      ...: 'version': '1.0.0.1999-10_7_2020__4_18_30',
      ...: 'sessionGUID': 'f4123f21-31ad-4e83-ba77-41231238',
      ...: 'locationid': '0601eba9-1259-4ae6-bad5-7d1231239',
      ...: 'eventDescription': 'Page Load'}

In [2497]: s1.get('eventData')
Out[2497]: {'type': 'page', 'name': 'chicken 2'}

If its a string representation of a JSON, then use:

import ast
ast.literal_eval(s).get('eventData')
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58