0

So I have this text that I extracted out of a <script> tag.

function fbq_w123456as() {
    fbq('track', 'AddToCart', {
        contents: [
            {
                'id': '123456',
                'quantity': '',
                'item_price':69.99
            }
        ],
        content_name: 'Stackoverflow',
        content_category: '',
        content_ids: ['w123456as'],
        content_type: 'product',
        value: 420.69,
        currency: 'USD'
    });
}

I'm trying to extract this information using regex and later converting it into JSON using python. I've tried re.search(r"'AddToCart', (.*?);" and a few other attempts but no luck. I am very new to regex and I am struggling with it.

{
    "contents":[
        {
            "id":"123456",
            "quantity":"",
            "item_price":69.99
        }
    ],
    "content_name":"Stackoverflow",
    "content_category":"",
    "content_ids":[
        "w123456as"
    ],
    "content_type":"product",
    "value":420.69,
    "currency":"USD"
}

How would I create the regex to extract the JSON data?

InSync
  • 4,851
  • 4
  • 8
  • 30
Charlie
  • 31
  • 4
  • Does this answer your question? [How to convert raw javascript object to a dictionary?](https://stackoverflow.com/questions/24027589/how-to-convert-raw-javascript-object-to-a-dictionary) – InSync Jul 17 '23 at 00:00

1 Answers1

1

You can try:

import re
from ast import literal_eval

js_txt = """\
    function fbq_w123456as() {
            fbq('track', 'AddToCart', {
            contents: [
            {
                    'id': '123456',
                    'quantity': '',
                    'item_price':69.99                                                        }
            ],
            content_name: 'Stackoverflow',
            content_category: '',
            content_ids: ['w123456as'],
            content_type: 'product',
            value: 420.69,
            currency: 'USD'
            });
    }"""

out = re.search(r"'AddToCart', (\{.*?\})\);", js_txt, flags=re.S).group(1)
out = re.sub(r"""([^"'\s]+):""", r'"\1":', out)
out = literal_eval(out)
print(out)

Prints python dict:

{
    "contents": [{"id": "123456", "quantity": "", "item_price": 69.99}],
    "content_name": "Stackoverflow",
    "content_category": "",
    "content_ids": ["w123456as"],
    "content_type": "product",
    "value": 420.69,
    "currency": "USD",
}
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91