Once done using BeautifulSoup4, how do I make soup object further filtered?

Question

I am trying to learn how to parse HTML data so I've chosen this website (http://ets.aeso.ca/ets_web/ip/Market/Reports/CSMPriceReportServlet) which has real time data for electricity prices.

    from bs4 import BeautifulSoup
    from urllib.request import urlopen
    import pandas as pd

    url = "http://ets.aeso.ca/ets_web/ip/Market/Reports/CSMPriceReportServlet"
    page = urlopen(url)
    html = page.read().decode("utf-8")
    soup = BeautifulSoup(html, "html.parser")


    print(soup.get_text())

My question is how do I further parse / filter a BS4 object once it's created?

I just want the data that updates so I can throw it into a pandas dataframe.

Like so:

Date (HE)| Time |Price ($)| Volume (MW)

01/15/2021| 14 |13:10 |40.16| 80

01/15/2021| 14 |13:05 |40.18| 100

01/15/2021 | 14 |13:00| 40.16| 80

depending on the format of the page, you can try pandas.read_html(), or if that doesn't work well on that site, you can use beautifulsoup.find/find_all — goalie1998, Jan 15 '21 at 20:52

HedgeHog · Accepted Answer · 2021-01-15T21:35:52.467

As goalie1998 mentioned the esiest way is to use pandas, just two lines of code.

Example

Notice, pandas read_html() stores all tables in a list of data frames. Cause the table you want is the third in the source you could select it directly by slicing data frame list with [2]

import pandas as pd
pd.read_html('http://ets.aeso.ca/ets_web/ip/Market/Reports/CSMPriceReportServlet')[2]

Output

    Date (HE)   Time    Price ($)   Volume (MW)
0   01/15/2021 14   13:10   40.16   80
1   01/15/2021 14   13:05   40.18   100
2   01/15/2021 14   13:00   40.16   80
3   01/15/2021 13   12:40   40.16   80
4   01/15/2021 13   12:00   40.01   100
5   01/15/2021 12   11:54   40.01   100
6   01/15/2021 12   11:00   40.18   100
7   01/15/2021 11   10:54   40.18   100
8   01/15/2021 11   10:24   40.16   80
9   01/15/2021 11   10:00   40.18   100

Just in case you really have to go with `beautifulsoup`

Select the third table or tablewithout a class on the site
Select all the tr in the tableand loop over them and append all texts from the tds in the data list
Import the data lists in to pandas