0

I have been working on a Python project analyzing a CSV file and cannot get the output to show me sums with my strings, just lists of the numbers that should be summed.

Code I'm working with:

import pandas as pd

data = pd.read_csv('XML_projectB.csv')
#inserted column headers since the raw data doesn't have any
data.columns = ['name','email','category','amount','date']

data['date'] = pd.to_datetime(data['date'])

#Calculate the total budget by cateogry
category_wise = data.groupby('category').agg({'amount':['sum']})
category_wise.reset_index(inplace=True)
category_wise.columns = ['category','total_budget']

#Determine which budget category people spent the most money in
max_budget = category_wise[category_wise['total_budget']==max(category_wise['total_budget'])]['category'].to_list()

#Tally the total amounts for each year-month (e.g., 2017-05)
months_wise = data.groupby([data.date.dt.year, data.date.dt.month])['amount'].sum()
months_wise = pd.DataFrame(months_wise)
months_wise.index.names = ['year','month']
months_wise.reset_index(inplace=True)

#Determine which person(s) spent the most money on a single item.
person = data[data['amount'] == max(data['amount'])]['name'].to_list()

#Tells user in Shell that text file is ready
print("Check your folder!")

#Get all this info into a text file
tfile = open('output.txt','a')

tfile.write(category_wise.to_string())
tfile.write("\n\n")
tfile.write("The type with most budget is " + str(max_budget) + " and the value for the same is " + str(max(category_wise['total_budget'])))
tfile.write("\n\n")
tfile.write(months_wise.to_string())
tfile.write("\n\n")
tfile.write("The person who spent most on a single item is " + str(person) + " and he/she spent " + str(max(data['amount'])))
tfile.close()

The CSV raw data looks like this (there are almost 1000 lines of it):

Walker Gore,wgore8i@irs.gov,Music,$77.98,2017-08-25
Catriona Driussi,cdriussi8j@github.com,Garden,$50.35,2016-12-23
Barbara-anne Cawsey,bcawsey8k@tripod.com,Health,$75.38,2016-10-16
Henryetta Hillett,hhillett8l@pagesperso-orange.fr,Electronics,$59.52,2017-03-20
Boyce Andreou,bandreou8m@walmart.com,Jewelery,$60.77,2016-10-19

My output in the txt file looks like this:

      category               total_budget                                                                                                                                                                                                                                                                                                                               
0    Automotive              $53.04$91.99$42.66$1.32$35.07$97.91$92.40$21.28$36.41
1          Baby              $93.14$46.59$31.50$34.86$30.99$70.55$86.74$56.63$84.65
2        Beauty              $28.67$97.95$4.64$5.25$96.53$50.25$85.42$24.77$64.74
3         Books              $4.03$17.68$14.21$43.43$98.17$23.96$6.81$58.33$30.80
4      Clothing              $64.07$19.29$27.23$19.78$70.50$8.81$39.36$52.80$80.90

    year  month               amount                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
0   2016      9               $97.95$67.81$80.64                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
1   2016     10               $93.14$6.08$77.51$58.15$28.31$2.24$12.83$52.22$48.72                                                                                                               
2   2016     11               $55.22$95.00$34.86$40.14$70.13$24.82$63.81$56.83                                                                                                                                                                                                                                              
3   2016     12               $13.32$10.93$5.95$12.41$45.65$86.69$31.26$81.53                       

I want the total_budget column to be the sum of the list for each category, not the individual values you see here. It's the same problem for months_wise, it gives me the individual values, not the sums. I tried the {} .format in the write lines, .apply(str), .format on its own, and just about every other Python permutation of the conversion to string from a list I could think of, but I'm stumped.

What am I missing here?

SassyG
  • 41
  • 5
  • Since the amounts begin with `$` they're treated as strings, not numbers. So summing them concatenates. – Barmar Oct 26 '21 at 01:27
  • ```@Barmar```, how can I get those concats to print out in the output strings then? – SassyG Oct 26 '21 at 03:38

1 Answers1

0

As @Barmar said, the source has $XX so it is not treated as numbers. You could try following this approach to parse the values as integers/floats instead of strings with $ in them.

fam-woodpecker
  • 500
  • 2
  • 10
  • I followed the link, and replaced the first code line past the import pandas as: ```data = pd.read_csv('XML_projectB.csv', converters={'amount': lambda s: float(s.replace('$', ''))})```, but it's still not working. – SassyG Oct 26 '21 at 03:27
  • That is because your raw data does not have column names, so when it reads the data, `amount` does not exist. Instead of giving the column use the index, in your case 3. Also, since the csv has no header (column names) then pass the argument `header = None`. Full line should be: `pd.read_csv('XML_projectB.csv', header = None, converters = {3 : lambda s: float(s.replace('$', ''))})` – fam-woodpecker Oct 26 '21 at 03:48
  • ```@Stuart Mills```, that works! Thanks. I knew it had to be something with how the raw data was being interpreted. – SassyG Oct 26 '21 at 03:55