Group by 2 columns and print data for each combination in Pandas

Question

I would like to group df below by Date and ItemId:

            Id          Timestamp               Data    ItemId      Date
2012-04-21  19389576    2012-04-21 00:04:03.533 39.0    1              2012-04-21
2012-04-21  19389577    2012-04-21 00:04:04.870 38.5    1              2012-04-21
2012-04-21  19389608    2012-04-21 00:07:03.450 38.0    1              2012-04-21
                        ...

2012-04-22  19389609    2012-04-21 00:03:04.817 37.5    2              2012-04-21
2012-04-22  19389620    2012-04-21 00:10:04.400 37.0    2              2012-04-21
                        ...

to get all combinations of Date and ItemId, then select from original dataframe df using each combination of Date and ItemId, for instance, Date== 2012-04-21 and ItemId==1, Date== 2012-04-21 and ItemId==2 ...

How do I select data using the 2 columns simultaneously in the for-loop?

Does this answer your question? [Pandas DataFrame Groupby two columns and get counts](https://stackoverflow.com/questions/17679089/pandas-dataframe-groupby-two-columns-and-get-counts) — Roy2012, Jun 18 '20 at 04:31
@nilsinelabore After group by, what do you want to do with grouped dataframe? — MrNobody33, Jun 18 '20 at 04:42
@MrNobody33 I want to select and print data segment using each grouped pair, from the original dataframe `df`, such as data for `Date== 2012-04-21 and ItemId==1`, `Date== 2012-04-21 and ItemId==2`, `Date== 2012-04-21 and ItemId==2`, `Date== 2012-04-22 and ItemId==1`, `Date== 2012-04-22 and ItemId==2` ... — nilsinelabore, Jun 18 '20 at 04:45
@Roy2012 Thanks for the link but I was hoping to select the data rather than getting the count. — nilsinelabore, Jun 18 '20 at 05:07
So ... you'd like to get all the data, but grouped by these two columns? No aggregation whatsoever? — Roy2012, Jun 18 '20 at 05:24
@Roy2012 Yes, and print each data segment for a combination of `Date and ItemId` separately for all data. — nilsinelabore, Jun 18 '20 at 05:26

MrNobody33 · Accepted Answer · 2020-06-18T06:05:43.770

Since when using group by each row index will be a tuple (2012-04-21,1),(2012-04-21,2),(2012-04-22,1):

from datetime import datetime
import pandas as pd 
import io
s_e="""    Id    Timestamp    Data    ProductId    Date
    2012-04-21  19389576    2012-04-21 00:04:03.533    39.0    1    2012-04-21
    2012-04-21  19389577    2012-04-21 00:04:04.870    38.5    1    2012-04-21
    2012-04-21  19389608    2012-04-21 00:07:03.450    38.0    1    2012-04-22
    2012-04-22  19389609    2012-04-21 00:03:04.817    37.5    2    2012-04-21
    2012-04-22  19389620    2012-04-21 00:10:04.400    37.0    2    2012-04-22

    """
pd.set_option('display.max_columns', None )
df = pd.read_csv(io.StringIO(s_e), sep='    ', parse_dates=[1,4], engine='python')
df=df.groupby(['Date','ProductId']).agg(list)
print('df:\n',df)
print('df.index.values:\n',df.index.values)

Ouput:

>>>df:
                                                               Timestamp          Data  
Date       ProductId                                                                          
2012-04-21 1          [2012-04-21 00:04:03.533000, 2012-04-21 00:04:04.870000]  [39.0, 38.5]  
           2                                      [2012-04-21 00:03:04.817000]        [37.5]  
2012-04-22 1                                      [2012-04-21 00:07:03.450000]        [38.0]  
           2                                      [2012-04-21 00:10:04.400000]        [37.0] 


>>>df.index.values:
 [(Timestamp('2012-04-21 00:00:00'), 1)
 (Timestamp('2012-04-21 00:00:00'), 2)
 (Timestamp('2012-04-22 00:00:00'), 1)
 (Timestamp('2012-04-22 00:00:00'), 2)]

You could try something like this to select specific combination, for example Date== 2012-04-21 and ItemId==1 combination:

datetoselect=(datetime.strptime('2012-04-21','%Y-%m-%d'),2)   #Date== 2012-04-21 and ItemId==1
print(df[[i==datetoselect for i in df.index.values]])

Output:

                                          Id                     Timestamp    Data
Date       ProductId                                                              
2012-04-21 2          [2012-04-22  19389609]  [2012-04-21 00:03:04.817000]  [37.5]

Could you please explain `.agg(list).reset_index()['Data']`? — nilsinelabore, Jun 18 '20 at 05:02
So it returns the list of `Data` values for each group? Can we get the individual dataframes? — nilsinelabore, Jun 18 '20 at 05:39
Yeah, it returns a list of Data of all combinations. To get the individual dataframe of an specific combination, see mny answer, I just edited. — MrNobody33, Jun 18 '20 at 06:08

score 2 · Answer 2 · answered Jun 18 '20 at 05:32

IIUC, If you want to simply print the data for each group use:

for key, group in df.groupby(['ItemId', 'Date']): 
    print(key)
    print(group)

This prints:

(1, '2012-04-21')
                  Id                Timestamp  Data  ItemId        Date
2012-04-21  19389576  2012-04-21 00:04:03.533  39.0       1  2012-04-21
2012-04-21  19389577  2012-04-21 00:04:04.870  38.5       1  2012-04-21
2012-04-21  19389608  2012-04-21 00:07:03.450  38.0       1  2012-04-21

(2, '2012-04-21')
                  Id                Timestamp  Data  ItemId        Date
2012-04-22  19389609  2012-04-21 00:03:04.817  37.5       2  2012-04-21
2012-04-22  19389620  2012-04-21 00:10:04.400  37.0       2  2012-04-21

Does it print the `dataset` itself? or just the pair of `key` and `group` values? — nilsinelabore, Jun 18 '20 at 05:36
It prints all the values in dataset grouped by dates and itemid — Shubham Sharma, Jun 18 '20 at 05:40

score 1 · Answer 3 · answered Jun 18 '20 at 04:34

1

Try to do a dual selector by adding each one into a set of parenthesis and in between add a ampersand &:

df[(df[“Date”] == “2020-04-21”)& (df[“ItemId”] == 2)]

answered Jun 18 '20 at 04:34

Colton Neary

80
1
9

Thank you Colton, but I will have to group the data by columns `['Date']` and `['ItemId']`, then select using the code you suggested. I am wondering how I could do that after `groupby` ? – nilsinelabore Jun 18 '20 at 04:41
Can you elaborate a little further? What it sounds like you want is a double indexed dataframe. Is this what you are thinking [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html)? – Colton Neary Jun 18 '20 at 04:45
I think it is double indexing. What I am not sure is how to select data using different combination of `Date` and `ItemId ` automatically in a loop, when the columns I will use become the indexes. – nilsinelabore Jun 18 '20 at 04:50
1

I have only used groupby function to aggregate data. For example. I want to find all the values that were on 04-21-20 and item ID of 10 and aggregate them by taking the mean. I do not see the aggregation part in your question. If you want to select it the multi index df you should be able to call with df[“4-21-20” , 10] to select it – Colton Neary Jun 18 '20 at 04:50
Yes I am not doing any aggregation, but just want to select the data using the unique combination of `Date` and `ItemId` after `groupby`. Do you know if I can call the `Date` and `ItemId` iteratively using `df[ 'Date' , 'ItemId']`? – nilsinelabore Jun 18 '20 at 05:05
1

Yes. I would create the multi under data frame. Then you can use for i, j in df.index: print (i,j) for testing to display each combination of the two columns – Colton Neary Jun 18 '20 at 05:40

Group by 2 columns and print data for each combination in Pandas

3 Answers3