I'm working on putting together a budgeting program and the end result will be a bar chart with multiple bars that shows the total spent along 6 categories by date. So far, I've been able to clean up the data to show the date, category, and spend (pulling from a Google sheets workbook):
def process_data(worksheet):
data = worksheet.get_all_records()
all_cost = []
for i in range(len(data)):
date = data[i].get('Timestamp').split()
cost = data[i].get("Cost")
category = data[i].get("Category")
newdict = {"Date": date[0], "Spend": cost, "Category":category}
all_cost.append(newdict)
df = pd.DataFrame(all_cost)
The output dataframe looks like this:
Date Spend Category
0 11/25/2020 54.32 Grocery
1 11/25/2020 49.77 Projects
2 11/25/2020 34.22 Gas
3 11/25/2020 46.00 Grocery
4 11/27/2020 37.00 Restaurant/Eating Out
5 11/27/2020 72.00 Grocery
6 11/27/2020 129.31 Projects
7 11/27/2020 32.00 Bills
8 11/27/2020 69.42 Grocery
9 11/28/2020 12.69 Gas
10 11/28/2020 69.69 Restaurant/Eating Out
11 11/28/2020 6.66 Other
12 11/28/2020 444.21 Projects
13 11/29/2020 73.00 Grocery
Then I added:
new_df = df.groupby(["Date", "Category"])["Spend"].sum().reset_index()
in order to sum the total categorical spend per day:
Date Category Spend
0 11/25/2020 Gas 34.22
1 11/25/2020 Grocery 100.32
2 11/25/2020 Projects 49.77
3 11/27/2020 Bills 32.00
4 11/27/2020 Grocery 141.42
5 11/27/2020 Projects 129.31
6 11/27/2020 Restaurant/Eating Out 37.00
7 11/28/2020 Gas 12.69
8 11/28/2020 Other 6.66
9 11/28/2020 Projects 444.21
10 11/28/2020 Restaurant/Eating Out 69.69
I want to add a bar chart with matplotlib to show the breakdown of dollars spent per category per day, however I'm not sure how to do this since the quantity of recorded categories is different depending on the day.
Would the best way to go about doing this be to append a row with "0" for each category with no spend? Or is there an easier way to plot this? Thanks!
Edit:
I realized that the way I had structured my dataframe was not ideal. I think it will be easier to achieve what I want to do by structuring the data as follows in the sample:
df_test = pd.DataFrame({"11/25/2020": {"Gas":34.22, "grocery": 69.68, "Bills":15, "Gas": 10}, "11/27/2020": {"Gas": 16, "Projects": 144, "Bills":20}})
print(df_test)
to achieve the outcome:
11/25/2020 11/27/2020
Gas 10.00 16.0
grocery 69.68 NaN
Bills 15.00 20.0
Projects NaN 144.0
However, I'm not quite sure how to iterate lines of all of my data to achieve the desired output. Additionally, I know that I can't use a dictionary for this because it will only be using the last set of key/value, while I want to sum the amounts for each category.
Should I use tuples? How would I iterate through each line of data and sum the tuple values for each category? Thanks!