0

This is a follow up question from here

I have the below code which I use to scrape from a website

soup = bs4.BeautifulSoup(driver.page_source, "html.parser")
    for thead in soup.select(".data-point-container table thead"):
        tbody = thead.find_next_sibling("tbody")

        table = "<table>%s</table>" % (str(thead) + str(tbody))

        df = pandas.read_html(str(table), header=0, index_col=0)[0]
        df = df.drop(['Unnamed: 6'], axis=1)

        # Renaming Columns to just have FY-YEAR
        for each_column in df.columns:
            if each_column[:3] == "LTM":
                df.rename(columns={each_column: "Last 12 Months"}, inplace=True)
            else:
                df.rename(columns={each_column: each_column[:6]}, inplace=True)

        df = df.T

        print(df)
        print("-------------------------------------")

Upon execution, it produces this result.

capture 1

The screenshot just shows 2 dataframes, theres a total of 6 dataframes on code execution.

What I want to do is to merge them together, so that on the row axes, it just shows FY2012, FY2013, FY2014, FY2015, Last 12 Months, and on the column axis, it would show a combination of all the rows from the 6 Dataframe it scrapped from the website.

I think I can do this by separating it into different variables and use a form of df.join() method to achieve this. But I'm having trouble is separating this df into different variables in the first place..

What do you think?

Update

Initially I thought I could just do a print(df[0]) but this gives me a keyerror:0

Community
  • 1
  • 1
jake wong
  • 4,909
  • 12
  • 42
  • 85

0 Answers0