0

I am trying to retrieve data from a json element within a HTML page with python 3.

My regex looks like this:

re.findall(r'(?:(dataLayerLocal\.products = ))\[.*\];', response.body.decode("utf-8"))

For a snipped starting like this:

<script type="text/javascript">
    dataLayerLocal.products = dataLayerLocal.products || [];
    dataLayerLocal.products = [{'amount':'0','categories':['

Unfortunatelly the non capturing group is capturing, and the part I want is not included:

>>> re.findall(r'(?:(dataLayerLocal\.products = ))\[.*\];', response.body.decode("utf-8"))
['dataLayerLocal.products = ']

This one works, but returns me the unwanted part:

>>> re.findall("dataLayerLocal\.products = \[.*\];", response.body.decode("utf-8"))
["dataLayerLocal.products = [{'amount':'0','cate

Full demo:

https://regex101.com/r/Qpds6V/1

How can I get a result starting like this?

[{'amount':'0'
merlin
  • 2,717
  • 3
  • 29
  • 59

0 Answers0