3

I have following DataFrame

                    (polygon object)     ASSAULT     BURGLARY   bank     cafe    crossing
INCIDENTDATE                                                                            
2009-01-01 02:00:00                A           1           0       0        1           0
2009-01-01 02:00:00                A           1           0       0        1           0
2009-01-01 02:00:00                A           1           0       1        0           0
2009-01-01 02:00:00                A           1           0       0        0           1
2009-01-01 02:00:00                A           1           0       0        1           0
2009-01-04 11:00:00                B           0           1       1        0           0
2009-01-04 11:00:00                B           0           1       1        0           0
2009-01-04 11:00:00                B           0           1       0        0           0
2009-01-04 11:00:00                B           0           1       1        0           0
2009-01-04 11:00:00                B           0           1       0        1           0

I want to aggregate that DataFrame to only have unique 'INCIDENTDATE'

while doing this I want the value of each column (except polygon) to be 1 if it was 1 in at least one row of same 'INCIDENTDATE' rows.

The final DataFrame should look like this:

                    (polygon object)    ASSAULT     BURGLARY    bank     cafe    crossing
INCIDENTDATE                                                                            
2009-01-01 02:00:00                A           1           0       1        1           1
2009-01-04 11:00:00                B           0           1       1        1           0

How would i achieve that in pandas? Googling my question pointed me to the groupby() function but I really dont understand how i would use it here.

2 Answers2

3

I think just reset in the index then groupby that new column and look for the max values of each group:

df.reset_index(inplace=True)
df.groupby('INCIDENTDATE').max()
it's-yer-boy-chet
  • 1,917
  • 2
  • 12
  • 21
2

The max function should do this:

df.groupby("INCIDENTDATE").agg("max")
Tim
  • 2,756
  • 1
  • 15
  • 31