DataScraping in Python

Question

I am trying to scrape data from https://www.transfermarkt.co.uk/premier-league/startseite/wettbewerb/GB1

I have used this code to do so:

headers = {'User-Agent': 
           'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}

page = 'https://www.transfermarkt.co.uk/premier-league/startseite/wettbewerb/GB1'
pageTree = requests.get(page, headers=headers)
pageTree_text = pageTree.text

pageSoup = BeautifulSoup(pageTree_text, 'html.parser')

After, I want to find all the links that is connected to each team name, and use this code:

linkLocation = pageSoup.find_all("a", {"class": "vereinprofil_tooltip tooltipstered"})
linkLocation[0].text

output:

IndexError Traceback (most recent call last) in 1 linkLocation = pageSoup.find_all("a", {"class": "vereinprofil_tooltip tooltipstered"}) ----> 2 linkLocation[0].text

IndexError: list index out of range

Why doesn`t the list have any of the links within it?

Thnx in advcance!

score 0 · Accepted Answer · answered Mar 01 '20 at 18:10

0

"tooltipstered" class is added by javascript and is not available in the plain html document returned by the server. You can see that when you open the "source" of the page not using browser inspector.

As you can see "tooltipster" is some jquery plugin, you will need to use some other tool to scrape this page (eg.: selenium).

<script type="text/javascript" src="https://tmssl.akamaized.net//assets/e17e6900/js/jquery.tooltipster.js?lm=1574952016"></script>

answered Mar 01 '20 at 18:10

vonschlager

324
1
6

Hi So I cannot scrape the data from this page, using Python and BeautifulSoup? – Haroon Mar 01 '20 at 18:32
You can using python, but not with BeautifulSoup alone. You can try with this SO answer: https://stackoverflow.com/questions/49939123/scrape-dynamic-contents-created-by-javascript-using-python – vonschlager Mar 01 '20 at 18:39

DataScraping in Python

1 Answers1