How can I make a new column that does calculations but first selects them by my id column?

Question

I would like to do calculations on my x column and make a new column, for example lets try to determine the rolling standard deviation, I know how to calculate that for the full column:

df['std'] = df.x.rolling(2).std

Example of original dataframe:

But I want that it will do this calculation with selecting them first by the id column.

Because now I first have to cut the data frame in separate data frames, and then paste them together after, and I think there should be a better way. But I do not know how to express this in a question so I can Google it.

So I would like to do the following:

df['std'] = df.x[id = 1].rolling(2).std() 
          = df.x[id = 2].rolling(2).std()
          = df.x[id = 3].rolling(2).std()

I know this is not a correct way but I try to show what I want to achieve

Does this answer your question? [Python - rolling functions for GroupBy object](https://stackoverflow.com/questions/13996302/python-rolling-functions-for-groupby-object) — SomeDude, Jun 18 '22 at 17:16
Well so i try: df.groupby('id')['x'].rolling(2).mean(), but I get the error: TypeError: incompatible index of inserted column with frame index — SimonDL, Jun 18 '22 at 17:22

score 1 · Accepted Answer · answered Jun 18 '22 at 17:22

1

As you are filtering for each "id", you can use GroupBy:

df.groupby("id")["x"].rolling(2).std()
#Out[7]: 
#id    
#1   0           NaN
#    1      7.071068
#    2     10.606602
#    3      0.000000
#2   4           NaN
#    5     14.142136
#    6     21.213203
#    7      7.071068
#3   8           NaN
#    9      3.535534
#    10     0.707107
#    11     0.000000
#    12     0.000000
#    13     1.414214
#Name: x, dtype: float64

To append it as another column, you need to first drop the "id" groups from the index:

df["std"] = df.groupby("id")["x"].rolling(2).std().reset_index(0, drop=True)

answered Jun 18 '22 at 17:22

Rawson

2,637
1
5
14

Yes I just tried this and I get the error: TypeError: incompatible index of inserted column with frame index. but I did not try the last line of code you presented. – SimonDL Jun 18 '22 at 17:24
1

That occurs because your resulting pandas Series has an index of tuples `(0, 1), (1, 2), ..., (last row, 3)`. You need the same index to be able to combine, i.e. without the groups. – Rawson Jun 18 '22 at 17:26
Yes, I actually have datetime as index, and it is important I do not lose this as the index. So this worked: df_new = df.groupby("id")["x"].rolling(2).std(). But if I do : df['std'] = df.groupby("id")["x"].rolling(2).std() it will give the error – SimonDL Jun 18 '22 at 17:29
And even when I have a normal index (1,2,3,4 etc...) I still get the same error – SimonDL Jun 18 '22 at 17:32
But if i want to do another function then rollings. like this: df.groupby("sensor_id")["xyz_axis_std"].gaussian_filter1d(df.xyz_axis_std, 200).reset_index(0, drop=True) python thinks the function is a object and i get the following error : 'SeriesGroupBy' object has no attribute 'gaussian_filter1d' . I imported the scipy function the right way. – SimonDL Jun 18 '22 at 17:44
1

Ah, that is a different function, so has to be applied in a different way. You could use `groupby.transform` here: `df["gf1d"] = df.groupby("id")["x"].transform(lambda x: gaussian_filter1d(x, 200))` – Rawson Jun 19 '22 at 15:31

score 1 · Answer 2 · answered Jun 18 '22 at 17:23

1

Just loop over the dataframe rows, perform your calculation, and store the calculated value to dataframe

for index, row in df.iterrows():
print (row ['id'], row ['x'])
id=row ['id']
x = row ['x']
#perform you calculation and join with the previous df

answered Jun 18 '22 at 17:23

Deepak Kumar Sachan

27
3

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 19 '22 at 07:53

How can I make a new column that does calculations but first selects them by my id column?

2 Answers2