I'm new and I have this dataframe:
df0 = pd.DataFrame({
'timestamp': ['2023-01-02','2023-01-01','2023-01-02','2023-01-01'],
'id': ['2','1','1','2'],
'close': [150,10,11,100]
})
df0
timestamp id price
0 2023-01-02 2 150
1 2023-01-01 1 10
2 2023-01-02 1 11
3 2023-01-01 2 100
and I'm trying to create a new column with group by id, then sort by timestamp, then make a computation on the price to have a the daily return.
The dataframe should look like this afterwards:
timestamp id price daily_return
0 2023-01-01 1 10 NaN
1 2023-01-02 1 11 0.1
2 2023-01-01 2 100 NaN
3 2023-01-02 2 150 0.5
So far I have tried this but in vain:
df1 = (df0
.groupby(['id'])
.sort_values(['timestamp'])
.assign(daily_return = lambda x: (x['price'] - x['price'].shift(1)) / x['price'].shift(1))
)
it gives me this error:
AttributeError Traceback (most recent call last)
Cell In[84], line 3
1 df1 = (df0
2 .groupby(['id'])
----> 3 .sort_values(['timestamp'])
4 .assign(past_day_return = lambda x: (x['close'] - x['close'].shift(1)) / x['close'].shift(1))
5 )
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\groupby\groupby.py:987, in GroupBy.__getattr__(self, attr)
984 if attr in self.obj:
985 return self[attr]
--> 987 raise AttributeError(
988 f"'{type(self).__name__}' object has no attribute '{attr}'"
989 )
AttributeError: 'DataFrameGroupBy' object has no attribute 'sort_values'
I tried the assign part on its own and it works great but there is no sorting nor grouping.
Can you help me please?
EDIT:
I managed to find the answer on my own:
df0['past_day_return'] = (df0['close'] / df0.sort_values('timestamp').groupby('id')['close'].shift(1)) - 1
df0.sort_values(by = ['timestamp','id'])
Which gives me:
timestamp id price daily_return
0 2023-01-01 1 10 NaN
1 2023-01-02 1 11 0.1
2 2023-01-01 2 100 NaN
3 2023-01-02 2 150 0.5