0

I'm trying trying parse a website with bs4

<script>

var _load_pages = [{"n":1,"w":"760","h":"1990","u":"url1"},{"n":2,"w":"760","h":"1990","u":"url2"},{"n":3,"w":"760","h":"1990","u":"url3"},{"n":4,"w":"760","h":"1990","u":"url4"},{"n":5,"w":"760","h":"1990","u":"url5"}];

</script>

I need help to get these URLs. I don't know what to use to get them.

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197

1 Answers1

1

Try this

import re, json

soup = BeautifulSoup(html, 'lxml')
for s in soup.find_all('script'):
    js = json.loads(re.findall(r'var _load_pages = (.*?);', s.string)[0])
urls = []
for j in js:
    urls.append(j['u'])
print(urls)
Nanthakumar J J
  • 860
  • 1
  • 7
  • 22