The answer given by @rusu_ro1 is correct. However, I think that Pandas is the right tool for job here.
You can use pandas.read_html to get all the tables in the page. Then use pandas.DataFrame.to_excel to write only the last 4 tables to the excel workbook.
The following script scrapes the data and writes each table to a different sheet.
import pandas as pd
all_tables = pd.read_html(
"https://www.proff.no/regnskap/yara-international-asa/oslo/hovedkontortjenester/IGB6AV410NZ/"
)
with pd.ExcelWriter('output.xlsx') as writer:
# Last 4 tables has the 'konsernregnskap' data
for idx, df in enumerate(all_tables[4:8]):
# Remove last column (empty)
df = df.drop(df.columns[-1], axis=1)
df.to_excel(writer, "Table {}".format(idx))
Notes:
flavor : str or None, container of strings
The parsing engine to use. ‘bs4’ and ‘html5lib’ are synonymous with
each other, they are both there for backwards compatibility. The
default of None tries to use lxml to parse and if that fails it falls
back on bs4 + html5lib.
From HTML Table Parsing Gotchas
html5lib generates valid HTML5 markup from invalid markup
automatically. This is extremely important for parsing HTML tables,
since it guarantees a valid document. However, that does NOT mean that
it is “correct”, since the process of fixing markup does not have a
single definition.
In your specific case it drops the 5th table (it returns only 7). Perhaps b'coz both 1st and 5th table has the same data.