I have a value that I have stored in a string. I would like to append that value only to rows that meet certain criteria, and not to any others.
The following image shows the tables I need to parse. I can easily parse the file with BeautifulSoup
and turn it into a Pandas
DataFrame, but for both of the tables below I'm struggling to capture and append the Package
prices to the entire DataFrame. Ideally the Price values would go alongside every Fish-Weight pair; so a single column of the same Price value.
Here is the code I use to parse the tables:
with open(file_path) as in_f:
msg = email.message_from_file(in_f) #type: <class 'email.message.Messgae'>
html_msg = msg.get_payload(1) #type: <class 'email.message.Message'>
body = html_msg.get_payload(decode=True) #type: <class 'bytes'> or type: 'int'
html = body.decode() #type: <class 'str'>
tablez = BeautifulSoup(html).find_all("table") #type: <class 'bs4.element.ResultSet'>
data = []
for table in tablez:
for row in table.find_all("tr"):
data.append([cell.text.strip() for cell in row.find_all("td")])
fish_frame = pd.DataFrame(data)
This is what data
is:
data: [['Species', 'Price', 'Weight'], ['GBW Cod', '.55', '8,059'], ['GBE Haddock', '.03', '14,628'], ['GBW Haddock', '.02', '87,451'], ['GB YT', '1.50', '1,818'], ['Witch', '1.25', '1,414'], ['GB Winter', '.40', '23,757'], ['Redfish', '.02', '123'], ['White Hake', '.40', '934'], ['Pollock', '.02', '7,900'], ['Package Price:', '', '$21,151.67'], ['Species', 'Weight'], ['GBE Cod', '820'], ['GBW Cod', '15,279'], ['GBE Haddock', '32,250'], ['GBW Haddock', '192,793'], ['GB YT', '6,239'], ['SNE YT', '2,018'], ['GOM YT', '1,511'], ['Plaice', '2,944'], ['Witch', '1,100'], ['GB Winter', '158,608'], ['White Hake', '31'], ['Pollock', '1,983'], ['SNE Winter', '7,257'], ['Price', '$58,500.00'], ['Species', 'Weight'], ['GBE Cod', '792'], ['GBW Cod', '14,767'], ['GBE Haddock', '29,199'], ['GBW Haddock', '174,556'], ['GB YT', '5,268'], ['SNE YT', '544'], ['GOM YT', '1,957'], ['Plaice', '2,452'], ['Witch', '896'], ['GB Winter', '163,980'], ['White Hake', '8'], ['Pollock', '1,743'], ['SNE Winter', '3,709'], ['Price', '$57,750.00']]
And then I use this bit of code to capture the Package
price:
stew = BeautifulSoup(html, 'html.parser')
chunks = stew.find_all('p', {'class' : "MsoNormal"})
for line in chunks:
if 'Package' in line.text:
package_price = line.text
print("package_price:", package_price)
But I'm now struggling to add that Price value to its own column in the dataframe. Doing a command such as fish_frame = pd.DataFrame(package_price)
results in:
Traceback (most recent call last):
File "Z:/Code/NEFS_stock_then_weight_attempt3.py", line 236, in <module>
fish_frame = pd.DataFrame(package_price)
File "C:\Users\stephen.mahala\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pandas\core\frame.py", line 345, in __init__
raise PandasError('DataFrame constructor not properly called!')
pandas.core.common.PandasError: DataFrame constructor not properly called!
due to reasons that are unknown to me. Turning it into a list
, however, results in the string being broken up and each character becoming its own list, and therefore each of those becomes its own cell in the DataFrame.
Is there a method with Pandas
or with BeautifulSoup
that I'm unaware of that will simplify the process of adding this single value to my DataFrame?