I've been trying to solve this for a while, and have come to the conclusion that I just need to ask. Basically, I have a table that I extracted from a PDF, using Tabula, and I put that in a DataFrame. The ultimate goal is to convert the extracted table in a JSON format so I end up converting the DataFrame into a dictionary. But, I have trouble with the dictionary, as the table spans two pages. When I'm trying to iterate over each row of the DataFrame, it takes the very first row as column headers, rather than data, and then treats the very first "Account" value I have as a key for every single row. I pasted the output below and as you can see the very first value under the Account column I have is being used as a key in a dictionary for every single dictionary as it iterates through each row.
Instead, what I need is this structure { Account : Allocation } so that if the first row of my dataframe shows " XYZ" under the Account column and "$12.0) under the Allocation column, then it should show up as this instead : {"XYZ" : "$12.0" )
json_data_list=[]
for df in dfs_list:
for index, row in df.iterrows:
try:
print(f"Row{index}:{row}")
account_name=re.match(r'^[A-Z][a-z]+(?:\s+[A-Z][a-z]+)*',row['Account']).group(0).strip()
allocation_value=re.match(r'\$\d+(?:\.d+)?',row['Allocation ($mm)']).group(0)
print(f"Allocation value: {allocation_value}")
except:
print:(row['Account'],"error"
if pd.notnull(account_name) and pd.notnull(allocation_value):
json_dict={account_name:allocation_value}
json_data_list.append(json_dict)
print(f"Adding dictionary:{json_dict}")
This is what I get as an output:
[
{
"0":{
"1":2,
"XYZ": "ABC",
"$12.0": "$3.0"
},
"1":{
"1":3,
"XYZ":"BCD",
"$12.0": "$5.5"
},