0

I have a dataframe such as this one:

     Date         Category1 Cat2   Cat3   Cat4   Value
0    2021-02-02   4310      0      1      0      1082.00
1    2021-02-03   5121      2      0      0      -210.82
2    2021-02-03   4310      0      0      0      238.41
3    2021-02-12   5121      2      2      0      -1489.11
4    2021-02-25   6412      1      0      0      -30.97
5    2021-03-03   5121      1      1      0      -189.91
6    2021-03-09   6412      0      0      0      238.41
7    2021-03-13   5121      0      0      0      -743.08

Date column has been converted into datetime format, Value is a float, other columns are strings.

I am trying to group the dataframe by month and by each level of category, such as:

  • Level 1 = filter over category 1 and sum values for each category for each month:

         Date         Category1 Value
    0    2021-02      4310      1320.41
    1    2021-02      5121      -1699.93
    2    2021-02      6412      -30.97
    3    2021-03      5121      -1489.11
    4    2021-03      6412      -932.99
    
  • Level 2 = filter over category 2 alone (one output dataframe) and over the concatenation of category 1 + 2 (another output dataframe):

         Date         Cat2   Value
    0    2021-02      0      1320.41
    1    2021-02      1      -1699.93
    2    2021-02      2      -30.97
    3    2021-03      0      -504.67
    4    2021-03      1      -189.91
    

    Second output :

             Date         Cat1+2    Value
        0    2021-02      43100     1320.41
        1    2021-02      51212     -1699.93
        2    2021-02      64121     -30.97
        3    2021-03      51210     -743.08
        4    2021-03      51211     -189.91
        5    2021-03      64120     238.41
    
  • Level 3 : filter over category 3 alone and over category 1+2+3

  • etc.

I am able to do one grouping at a time (by date or by category) but I can't combine them. Grouping by date:

df.groupby(df["Date"].dt.year)

Grouping by category:

df.groupby('Category1')['Value'].sum()
Blue Owl
  • 119
  • 12

1 Answers1

1

You can try this.

  1. To group by month, you can use this example

    df.groupby(df['Date'].dt.strftime('%B'))['Value'].sum()

    How can I Group By Month from a Date field using Python/Pandas

  2. For group by multiple columns

    df.groupby(['col5', 'col2'])

    You could create a Month year column and they group by using the new column.

    Pandas DataFrame Groupby two columns and get counts ,

    Extracting just Month and Year separately from Pandas Datetime column

Manjunath K Mayya
  • 1,078
  • 1
  • 11
  • 20
  • 1
    Thank you for your help. What I don't get is how to combine a date groupment (such as the one you proposed in step 1) with another groupment (as step 2 only works with column names in a single list) – Blue Owl Feb 22 '22 at 03:36
  • Yes. I overlooked it. My mistake – Manjunath K Mayya Feb 22 '22 at 03:40
  • But you could create a new column with only year and month values. Then you can group by this new month column and any other column in a list. Would that help? – Manjunath K Mayya Feb 22 '22 at 03:41
  • Updated the same in answer. Provided one more like which could be used to extract only month and year. – Manjunath K Mayya Feb 22 '22 at 03:52
  • 1
    Thanks for the tips! I can indeed create an additional with year and month and use the second formula to group over this new column and another category column. Simple and seems to work pretty well! – Blue Owl Feb 22 '22 at 04:08