0

The issue I am having is that I want to group the dataframe and then use functions to manipulate the data after its been grouped. For example I want to group the data by Date and then iterate through each row in the date groups to parse to a function?

The issue is groupby seems to create a tuple of the key and then a massive string consisting of all of the rows in the data making iterating through each row impossible

cs95
  • 379,657
  • 97
  • 704
  • 746
Tolki
  • 63
  • 1
  • 1
  • 9

1 Answers1

7

When you apply groupby on a dataframe, you don't get rows, you get groups of dataframe. For example, consider:

df
    ID        Date  Days  Volume/Day
0  111  2016-01-01    20          50
1  111  2016-02-01    25          40
2  111  2016-03-01    31          35
3  111  2016-04-01    30          30
4  111  2016-05-01    31          25
5  112  2016-01-01    31          55
6  112  2016-01-02    26          45
7  112  2016-01-03    31          40
8  112  2016-01-04    30          35
9  112  2016-01-05    31          30

for i, g in df.groupby('ID'):
     print(g, '\n')


    ID        Date  Days  Volume/Day
0  111  2016-01-01    20          50
1  111  2016-02-01    25          40
2  111  2016-03-01    31          35
3  111  2016-04-01    30          30
4  111  2016-05-01    31          25 

    ID        Date  Days  Volume/Day
5  112  2016-01-01    31          55
6  112  2016-01-02    26          45
7  112  2016-01-03    31          40
8  112  2016-01-04    30          35
9  112  2016-01-05    31          30 

For your case, you should probably look into dfGroupby.apply, if you want to apply some function on your groups, dfGroupby.transform to produce like indexed dataframe (see docs for explanation) or dfGroupby.agg, if you want to produce aggregated results.

You'd do something like:

r = df.groupby('Date').apply(your_function) 

You'd define your function as:

def your_function(df):
    ... # operation on df
    return result

If you have problems with the implementation, please open a new question, post your data and your code, and any associated errors/tracebacks. Happy coding.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • But what if the function I am applying to the group requires one of the columns of the original dataframe e.g. in your example you break down date by ID but what if the function relied on the days value within that group as a parameter ? – Tolki Sep 15 '17 at 03:53