0

I'm quite new to programming, and I'm using Python it for data manipulation and analysis.

I have a dataframe that looks like:

Brand   Date    Unit
A       1/1/19  10
B       3/1/19  11
A       11/1/19 15
B       11/1/19 5
A       1/1/20  10
A       9/2/19  18
B       12/2/19 11
B       19/2/19 8
B       1/1/20  5

And I would like to group by month, year and Brand. If it helps, I also have separate columns for Month and Year. The expected result should look like this:

Brand   Date    Unit
A       Jan 2019  25
B       Jan 2019  16
A       Feb 2019  18
B       Feb 2019  19
A       Jan 2020  8
B       Feb 2020  5

I tried adapting an answer from someone else's question:

per = df.Date.dt.to_period("M")
g = df.groupby(per,'Brand')
g.sum()

but I get prompted:

ValueError: No axis named Brand for object type <class 'pandas.core.frame.DataFrame'>

and I don't have any idea how to solve this.

I used to do this with dictionaries by selecting each month/year individually, group by sum and then create the dictionary, but it seems kind of brute force, really rough and it won't help if the df gets updated with new data.

Even more, maybe I'm having a bad approach to the situation. In the end I'd like to have a df looking like:

Brand    Jan 19   Feb 19   Jan 20 
A        25       18       8
B        16       19       5
Zephyr
  • 11,891
  • 53
  • 45
  • 80
kozoCMT
  • 3
  • 1
  • Check out: https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe – Erfan Jul 08 '20 at 22:41
  • thank you! really useful information, I didn't know about that and indeed it puts me on the right track. however, it still looks like it doesn't solve the grouping by date, as the dates are unique by nature as it also include the day (the date is 'too' unique?). Maybe if I create another column without day (1/1/2019 -> 1/2019)? – kozoCMT Jul 08 '20 at 22:58
  • Groupby wants a list `g = df.groupby([per,'Brand'])` – Paul H Jul 08 '20 at 23:38

2 Answers2

0

Use pandas.to_datetime and pandas.DataFrame.pivot_table:

df["Date"] = pd.to_datetime(df["Date"], dayfirst=True).dt.strftime("%b %Y")
new_df = df.pivot_table(index="Brand", columns="Date", aggfunc=sum)
print(new_df)

Output:

          Unit                  
Date  Feb 2019 Jan 2019 Jan 2020
Brand                           
A           18       25       10
B           19       16        5
Chris
  • 29,127
  • 3
  • 28
  • 51
  • Doing this with strings is less than ideal for the sorting as you see – Paul H Jul 08 '20 at 23:39
  • I have adapted a littlebit your answer and it worked perfectly. `df["Date"] = pd.to_datetime(df["Date"], dayfirst=True).dt.strftime("%b %Y") df = df.groupby(by=['Brand','Date'],as_index=False).sum() df = df.pivot_table(index="Brand", columns="Date", aggfunc=sum)` As without the grouping it was acting a little bit unexpecting. Thank you very much. I will continue learning about pivoting. – kozoCMT Jul 09 '20 at 07:37
0

You were close, DataFrame.groupby wants a list of groupers, not bare arguments.

Here's how I did it:

import pandas
from io import StringIO

csv = StringIO("""\
Brand   Date    Unit
A       1/1/19  10
B       3/1/19  11
A       11/1/19 15
B       11/1/19 5
A       1/1/20  10
A       9/2/19  18
B       12/2/19 11
B       19/2/19 8
B       1/1/20  5
""")

(
    pandas.read_csv(csv, parse_dates=['Date'], sep='\s+', dayfirst=True)
        .groupby(['Brand', pandas.Grouper(key='Date', freq='1M')])
        .sum()
        .reset_index()
)

And that gives me:

  Brand       Date  Unit
0     A 2019-01-31    25
1     A 2019-02-28    18
2     A 2020-01-31    10
3     B 2019-01-31    16
4     B 2019-02-28    19
5     B 2020-01-31     5
Paul H
  • 65,268
  • 20
  • 159
  • 136
  • It gives me this error: TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index' I couldn't use this one :( However, `df["Date"] = pd.to_datetime(df["Date"], dayfirst=True).dt.strftime("%b %Y")` worked just fine. Maybe I should've mention that the date was in datetime64 format and not string. – kozoCMT Jul 09 '20 at 07:34
  • yeah, if you have dates as strings, make them proper pandas Timestamps and life gets easier – Paul H Jul 09 '20 at 14:15