4

Want to select:

select
    user_id,
    max(case when value > 0 then timestamp else 0 end) as max_timestamp_when_value_is_positive
from df
group by user_id

What is the right way to aggregate?

groupped = raw_data.groupby('user_id')
res = groupped.agg({<how-to-do-described-aggregation?>})

UPDATE Explanation and example.

In [2]: df = pd.DataFrame({'user_id': [1, 1, 1, 2, 2, 3, 3, 3, 3],
                           'timestamp': [100, 200, 300, 10, 110, 10, 110, 210, 250],
                           'value': [0, 1, 0, 0, 0, 0, 10, 0, 1]})

In [3]: groupped = df.groupby('user_id')

In [4]: res = groupped.agg({'timestamp': [min, max],
                            'value': lambda x: sum(x > 0),
                            <described-magic>})

In [5]: res
Out[5]: 
        timestamp         value   <...magic...>
              min  max <lambda>
user_id                        
1             100  300        1    200
2              10  110        0    0
3              10  250        2    210

Magic is what I want.

Aleksandro M Granda
  • 665
  • 1
  • 8
  • 13

1 Answers1

4

Create a new column positive_value_timestamp as

df['positive_value_timestamp'] = df.timestamp * df.value.apply(lambda x: 1 if x > 0 else 0)

When grouping, take the max of this column

res = df.groupby('user_id').agg(
    {
        'timestamp': [min, max],
        'value': sum,
        'positive_value_timestamp': max
    })
Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85