I am trying to retrieve data from a json element within a HTML page with python 3.
My regex looks like this:
re.findall(r'(?:(dataLayerLocal\.products = ))\[.*\];', response.body.decode("utf-8"))
For a snipped starting like this:
<script type="text/javascript">
dataLayerLocal.products = dataLayerLocal.products || [];
dataLayerLocal.products = [{'amount':'0','categories':['
Unfortunatelly the non capturing group is capturing, and the part I want is not included:
>>> re.findall(r'(?:(dataLayerLocal\.products = ))\[.*\];', response.body.decode("utf-8"))
['dataLayerLocal.products = ']
This one works, but returns me the unwanted part:
>>> re.findall("dataLayerLocal\.products = \[.*\];", response.body.decode("utf-8"))
["dataLayerLocal.products = [{'amount':'0','cate
Full demo:
https://regex101.com/r/Qpds6V/1
How can I get a result starting like this?
[{'amount':'0'