I am extracting a HTML Table from Web with Pandas. In this result (List of Dataframe Objects) I want to return all Dataframes where the Cell Value is an Element of an given Array.
So far I am struggling to call only one one column value and not the whole Object.
Syntax of Table: (the Header Lines are not extracted correctly so this i the real Output)
0 | 1 | 2 | 3 |
---|---|---|---|
Date | Name | Number | Text |
09.09.2022 | Smith Jason | 3290 | Free Car Wash |
12.03.2022 | Betty Paulsen | 231 | 10l Gasoline |
import pandas as pd
import numpy as np
url = f'https://some_website.com'
df = pd.read_html(url)
arr_Nr = ['3290', '9273']
def correct_number():
for el in df[0][1]:
if (el in arr_Nr):
return True
def get_winner():
for el in df:
if (el in arr_Nr):
return el
print(get_winner())
With the Function
correct_number()
I can output that there is a Winner, but not the Details, when I try to access
get_winner()
EDIT
So far I now think I got one step closer: The function read_html() returns a list of DataFrame Objects. In my example, there is only one table so accessing it via df = dfs[0]
I should get the correct DataFrame Object.
But now when I try the following, the Code don't work as expected, there is no Filter applied and the Table is returned in full:
df2 = df[df.Number == '3290'] print(df2)