0

I'm new and I have this dataframe:

df0 = pd.DataFrame({
    'timestamp': ['2023-01-02','2023-01-01','2023-01-02','2023-01-01'],
    'id': ['2','1','1','2'],
    'close': [150,10,11,100]
})
df0

    timestamp   id  price
0   2023-01-02  2   150
1   2023-01-01  1   10
2   2023-01-02  1   11
3   2023-01-01  2   100

and I'm trying to create a new column with group by id, then sort by timestamp, then make a computation on the price to have a the daily return.

The dataframe should look like this afterwards:

    timestamp   id  price  daily_return
0   2023-01-01  1   10     NaN
1   2023-01-02  1   11     0.1
2   2023-01-01  2   100    NaN
3   2023-01-02  2   150    0.5

So far I have tried this but in vain:

df1 = (df0
       .groupby(['id'])
       .sort_values(['timestamp'])
       .assign(daily_return = lambda x: (x['price'] - x['price'].shift(1)) / x['price'].shift(1))
      )

it gives me this error:

AttributeError                            Traceback (most recent call last)
Cell In[84], line 3
      1 df1 = (df0
      2        .groupby(['id'])
----> 3        .sort_values(['timestamp'])
      4        .assign(past_day_return = lambda x: (x['close'] - x['close'].shift(1)) / x['close'].shift(1))
      5       )

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\groupby\groupby.py:987, in GroupBy.__getattr__(self, attr)
    984 if attr in self.obj:
    985     return self[attr]
--> 987 raise AttributeError(
    988     f"'{type(self).__name__}' object has no attribute '{attr}'"
    989 )

AttributeError: 'DataFrameGroupBy' object has no attribute 'sort_values'

I tried the assign part on its own and it works great but there is no sorting nor grouping.

Can you help me please?

EDIT:

I managed to find the answer on my own:

df0['past_day_return'] = (df0['close'] / df0.sort_values('timestamp').groupby('id')['close'].shift(1)) - 1 
df0.sort_values(by = ['timestamp','id'])

Which gives me:

    timestamp   id  price  daily_return
0   2023-01-01  1   10     NaN
1   2023-01-02  1   11     0.1
2   2023-01-01  2   100    NaN
3   2023-01-02  2   150    0.5
  • 1
    Does this answer your question? [How to sort by timestamps in pandas?](https://stackoverflow.com/questions/42462935/how-to-sort-by-timestamps-in-pandas) – B Remmelzwaal Feb 28 '23 at 01:29
  • @BRemmelzwaal no, because OP has already used `sort_values`. The problem is that they used `groupby` first, and the result of that doesn't have a `sort_values` attribute – Pranav Hosangadi Feb 28 '23 at 01:41
  • I've looked at several similar questions but I'm still confused on how to sort then group (or the opposite), and then make a new colunm with assign. It's like in "R" language I used group_by, then arrange, then mutate and it worked. – CommonPepper Feb 28 '23 at 01:51

0 Answers0