2

I've got a dataset of transactions that i'm trying to summarize by year but when i run the output is the individual amounts by transaction. I've imported a csv into into python using pd.read_csv and cleaned up a few of the columns in the dataframe as well as adding new ones.

I'm using the following code that's generating the output below

df['Year'] = df['Date'].dt.year
df.groupby(df['Year'])['revenue'].agg(['sum'])

OUTPUT
    Year                                       Sum
    2015      1203.21 1732.12 2551.01 1733.12 1323.44
    2016      3203.21 1532.12 4431.01 1433.12 7323.44
    2017      2203.21 122.12 131.01 1293.12 4223.44
    2018      6203.21 232.12 1131.01 1533.12 4323.44

I have the following line of code to convert 'revenue' from string to float

pd.to_numeric(df['revenue'])

Year, Date and revenue are all columns in my dataframe. I want to be able to have one total for each year above as opposed to seeing the value of the individual transactions. Thanks.

wolfblitza
  • 477
  • 1
  • 5
  • 16
  • 1
    What is the output of `df[['Year', 'revenue']].head()`? It looks like you are summing strings. – Alexander Nov 26 '19 at 05:55
  • The output is the values of the first four transactions for the first year of data. I updated the original question to include the code i have converting the string to float – wolfblitza Nov 26 '19 at 06:00
  • To see if it might be a string for the revenues, try: `df.groupby('Year')['revenue'].apply(lambda x: x.str.strip().astype(float).sum())`. If that works, that you should first convert the `revenue` column to floats before applying the `groupby`. – Alexander Nov 26 '19 at 06:05
  • I think you are close, only is necessary assign - `df['revenue'] = pd.to_numeric(df['revenue'])` and then `df['Year'] = df['Date'].dt.year df.groupby(df['Year'])['revenue'].agg(['sum']) ` – jezrael Nov 26 '19 at 06:14
  • Thanks Jezrael. That works. Alexander, is the code you are proposing converting the revenue to a float if it were an string? – wolfblitza Nov 26 '19 at 06:21
  • @wolfblitza Yes, I was proposing you first transform the revenue via `df['revenue'] = df['revenue'].astype(float)`, but it appears that you have now figured that out. – Alexander Nov 26 '19 at 06:51

2 Answers2

0

The code proposed by Jezrael above works as intended. I was using pd.to_numeric(df['revenue']) but have now changed it to df['revenue'] = pd.to_numeric(df['revenue'])

wolfblitza
  • 477
  • 1
  • 5
  • 16
0

You can assign back output of to_numeric function, because not working inplace:

df['revenue'] = pd.to_numeric(df['revenue'])

Alternative solution:

df['revenue'] = df['revenue'].astype(float)

df['Year'] = df['Date'].dt.year 
df1 = df.groupby('Year', as_index=False)['revenue'].sum()

If want one line solution:

df1=pd.to_numeric(df['revenue']).groupby(df['Date'].dt.year.rename('Year')).sum().reset_index()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252