How can I extract data from a web page and turn it into proper Pandas dataframe?

Question

For example, here is an address: https://pesdb.net/pes2021/?id=44379
There seems to be no api call (I am pretty new to this but I checked XHR in network monitor and there are no relevant json calls).

What have you tried? Show your code and ask specific question about a problem you are stuck with. Also what is expected output? — buran, Sep 18 '21 at 10:08
I assume he wants to know about web scraping, but doesn't know what it is, or where to start. — nihilok, Sep 18 '21 at 10:09

nihilok · Accepted Answer · 2021-09-18T10:28:38.697

There's an example here of how to parse an html table, with just the Pandas/requests library.

According to the latest docs, you can skip the requests call in that answer, but you will need to install dependencies:

pip install lxml html5lib beautifulsoup4

then you can do something like this:

df_list = pd.read_html('https://pesdb.net/pes2021/?id=44379')    # this parses all the tables in webpages to a list
df = df_list[0]                   # the first table on the page
print(df)                         # this is your dataframe!

Generally, Beautiful Soup 4 is the most popular Python library for webscraping.

You can read some examples here

Alternatively you could perform a GET request to the site and manually parse the response. (Most difficult / pointless)

How can I extract data from a web page and turn it into proper Pandas dataframe?

1 Answers1