How to extract onClick url using beautifulsoup

Question

Below is the HTML code which needs extraction

<div class="one_block" style="display:block;" onClick="location.href=\'/games/box.html
?&game_type=01&game_id=13&game_date=2020-04-19&pbyear=2020\';" style="cursor:pointer;">
<!-- \xe5\xb0\x8d\xe6\x88\xb0\xe7\x90\x83\xe9\x9a\x8
a\xe5\x8f\x8a\xe5\xa0\xb4\xe5\x9c\xb0 start -->
<table width="100%" border="0" cellspacing="0" cellpadding="0" class="schedule_team">
<tr>

How do I get the location.href value?

Tried:

soup.findAll("div", {"onClick": "location.href"})

Returns null

Desired Output:

/games/box.html?&game_type=01&game_id=13&game_date=2020-04-19&pbyear=2020

PS: there's plenty of location.href

Hi, did you try regular expression ? import re from bs4 import BeautifulSoup soup.find_all('a', {'onClick': re.compile(r'location.href')}) — jossefaz, Apr 27 '20 at 18:27
Its just an advice...maybe you can try it and use a profiler to track performance issues — jossefaz, Apr 27 '20 at 18:33
You can get some context here: https://stackoverflow.com/questions/38840221/beautifulsoup-how-does-findall-work — Captain Chaos, Apr 27 '20 at 18:40
sorry there was a mistake here....try with escape : soup.find_all('a', {'onClick': re.compile(r'location\.href')}) — jossefaz, Apr 27 '20 at 18:40
what didn't work ? you still get None ? or you got an error ? — jossefaz, Apr 27 '20 at 19:01

0m3r · Accepted Answer · 2020-04-29T00:30:14.063

2

How about using .select() method for SoupSieve package to run a CSS selector

from bs4 import BeautifulSoup

html = '<div class="one_block" style="display:block;" onClick="location.href=\'/games/box.html?&game_type=01&game_id=13&game_date=2020-04-19&pbyear=2020\';" style="cursor:pointer;">' \
        '<!-- \xe5\xb0\x8d\xe6\x88\xb0\xe7\x90\x83\xe9\x9a\x8a\xe5\x8f\x8a\xe5\xa0\xb4\xe5\x9c\xb0 start -->' \
        '<table width="100%" border="0" cellspacing="0" cellpadding="0" class="schedule_team"><tr>'

soup = BeautifulSoup(html, features="lxml")
element = soup.select('div.one_block')[0]
print(element.get('onclick'))

Use split to get just print(element.get('onclick').split("'")[1])

/games/box.html?&game_type=01&game_id=13&game_date=2020-04-19&pbyear=2020

edited Apr 29 '20 at 00:30

answered Apr 27 '20 at 19:55

0m3r

12,286
15
35
71

1

The output is ` location.href='/games/box.html?&game_type=01&game_id=25&game_date=2020-04-29&pbyear=2020';`, needs further string manupulation to get the url, can it be done via code – johnrao07 Apr 28 '20 at 11:47
What output you need? – 0m3r Apr 28 '20 at 17:05
Never mind I see it on ur op, I will try run test later – 0m3r Apr 28 '20 at 17:07
1

Maybe `print(element.get('onclick').split("'")[1])` – YasserKhalil Apr 29 '20 at 00:20
1

Thanks a lot. I am trying to learn from your posts. They are great resource. – YasserKhalil Apr 29 '20 at 00:25

How to extract onClick url using beautifulsoup

1 Answers1

Linked