So I have been trying to scrape out a value from a html that is a javascript. There is alot of javascript in the code but I just want to be able to print out this one:
var spConfig=newProduct.Config({
"attributes": {
"531": {
"id": "531",
"options": [
{
"id": "18",
"hunter": "0",
"products": [
"128709"
]
},
{
"label": "40 1\/2",
"hunter": "0",
"products": [
"120151"
]
},
{
"id": "33",
"hunter": "0",
"products": [
"120152"
]
},
{
"id": "36",
"hunter": "0",
"products": [
"128710"
]
},
{
"id": "42",
"hunter": "0",
"products": [
"125490"
]
}
]
}
},
"Id": "120153",
});
So I started by doing a code that looks like:
test = bs4.find_all('script', {'type': 'text/javascript'})
print(test)
The output I am getting is pretty huge so I am not able to post it all here but one of them is the javascript as I mentioned at the top and I want to print out only var spConfig=newProduct.Config
.
How am I able to do that, to be able to just print out var spConfig=newProduct.Config....
which I later can use json.loads that convert it to a json where I later on can scrape it more easier?
For any question or something I haven't explained well. I will apprecaite everything in the comment where I can improve myself aswell here in stackoverflow! :)
EDIT:
More example of what bs4 prints out for javascripts
<script type="text/javascript">varoptionsPrice=newProduct.Options({
"priceFormat": {
"pattern": "%s\u00a0\u20ac",
"precision": 2,
"requiredPrecision": 2,
"decimalSymbol": ",",
"groupSymbol": "\u00a0",
"groupLength": 3,
"integerRequired": 1
},
"showBoths": false,
"idSuffix": "_clone",
"skipCalculate": 1,
"defaultTax": 20,
"currentTax": 20,
"tierPrices": [
],
"tierPricesInclTax": [
],
"swatchPrices": null
});</script>,
<script type="text/javascript">var spConfig=newProduct.Config({
"attributes": {
"531": {
"id": "531",
"options": [
{
"id": "18",
"hunter": "0",
"products": [
"128709"
]
},
{
"label": "40 1\/2",
"hunter": "0",
"products": [
"120151"
]
},
{
"id": "33",
"hunter": "0",
"products": [
"120152"
]
},
{
"id": "36",
"hunter": "0",
"products": [
"128710"
]
},
{
"id": "42",
"hunter": "0",
"products": [
"125490"
]
}
]
}
},
"Id": "120153"
});</script>,
<scripttype="text/javascript">document.observe('dom:loaded',
function(){
varswatchesConfig=newProduct.ConfigurableSwatches(spConfig);
});</script>
EDIT update 2:
try:
product_li_tags = bs4.find_all('script', {'type': 'text/javascript'})
except Exception:
product_li_tags = []
for product_li_tag in product_li_tags:
try:
pat = "product.Config\((.+)\);"
json_str = re.search(pat, product_li_tag, flags=re.DOTALL).group(1)
print(json_str)
except:
pass
#json.loads(json_str)
print("Nothing")
sys.exit()