Pandas "Formulas" not working as expected

Question

I am trying to work with data from an accelerometer, trying to get the velocity from acceleration, on a df that looks like this:

{'T': {0: 0.007719999999999999,
  1: 0.016677999999999797,
  2: 0.024630999999996697,
  3: 0.0325849999999983,
  4: 0.040530999999995196},
 'Ax': {0: 0.16, 1: 0.28, 2: 0.28, 3: 0.44, 4: 0.57},
 'Ay': {0: 8.0, 1: 7.9, 2: 7.87, 3: 7.87, 4: 7.9},
 'Az': {0: 3.83, 1: 3.83, 2: 3.79, 3: 3.76, 4: 3.76},
 'delta T': {0: 0.00772,
  1: 0.008957999999999798,
  2: 0.0079529999999969,
  3: 0.007954000000001606,
  4: 0.007945999999996893}}

example of df

First, I set the Velocity of X, Y and Z to 0:

df_yt["Vx"] = 0
df_yt["Vy"] = 0
df_yt["Vz"] = 0

And then I entered the first value of each of these columns manually:

df_yt.loc[0,"Vx"] = 0.16*0.007720
df_yt.loc[0,"Vy"] = 8.00*0.007720
df_yt.loc[0,"Vz"] = 3.83*0.007720

I wanted to create a formula that returned the previous element in Vx + (Ax*delta T) of the same column. And to write the "formulas" of these 3 columns, I assumed it would be something like:

df_yt.loc[1:,"Vx"] = df_yt["Vx"].shift(1) + df_yt["Ax"]*df_yt["delta T"]
df_yt.loc[1:,"Vy"] = df_yt["Vy"].shift(1) + df_yt["Ay"]*df_yt["delta T"]
df_yt.loc[1:,"Vz"] = df_yt["Vz"].shift(1) + df_yt["Az"]*df_yt["delta T"]

and this code doesn't return any error but the numbers on the df don't match what they should, for example:

This number

should be 0.005970:

0.003743 + 0.28*0.007953 = 0.005970

I hope someone can help me with this because I don't know what is causing this mistake and I can't even understand where the wrong numbers are coming from.

Could you please provide the sample from the `df` as text, not as a picture. E.g. use `df.to_dict()` or `df.head().to_dict()` and post in a block between triple backticks (```). This makes it much easier for other users to reproduce your issue and to try to provide a solution. — ouroboros1, Oct 14 '22 at 06:48

ouroboros1 · Accepted Answer · 2022-10-14T07:39:31.560

Try as follows:

Use df.mul to multiply each column in ['Ax','Ay','Az'] with delta T along axis 0, and apply df.cumsum.

df_yt[['Vx','Vy','Vz']] = df_yt[['Ax','Ay','Az']].mul(df_yt['delta T'], 
                                                      axis=0).cumsum()

print(df_yt)

          T    Ax    Ay    Az   delta T        Vx        Vy        Vz
0  0.007720  0.16  8.00  3.83  0.007720  0.001235  0.061760  0.029568
1  0.016678  0.28  7.90  3.83  0.008958  0.003743  0.132528  0.063877
2  0.024631  0.28  7.87  3.79  0.007953  0.005970  0.195118  0.094019
3  0.032585  0.44  7.87  3.76  0.007954  0.009470  0.257716  0.123926
4  0.040531  0.57  7.90  3.76  0.007946  0.013999  0.320490  0.153803

Incidentally, the problem with your own attempt becomes apparent when you print the values for any of the .shift(1) statements. E.g. you do:

df_yt["Vx"] = 0
df_yt.loc[0,"Vx"] = 0.16*0.007720

print(df_yt["Vx"].shift(1))

0         NaN
1    0.001235
2    0.000000
3    0.000000
4    0.000000
Name: Vx, dtype: float64

So, in a line such as df_yt.loc[1:,"Vx"] = df_yt["Vx"].shift(1) + df_yt["Ax"]*df_yt["delta T"], per row you are adding: nothing (NaN), 0.001235, and then just zeros after that. E.g. this adds correct values only for the second row (index 1).

You're right, and your code worked perfectly, I don't know how it worked without referencing the previous value of Vx, Vy and Vz, but it's returning the correct values — Francisco Barroca, Oct 14 '22 at 07:53
"I don't know how it worked without referencing the previous value of Vx, Vy and Vz". This is what `.cumsum()` is doing: it "[r]eturn[s] cumulative sum over a DataFrame or Series axis". E.g., for `Ax` we are first getting the multipled values: `[0.001235, 0.002508, 0.002227, 0.0035, 0.004529]` (rounded here) and `cumsum` then turns this into `[0.001235, 0.003743, 0.00597, 0.00947, 0.013999]`, i.e. `[0.001235, 0.002508 + 0.001235, 0.002227 + 0.002508 + 0.001235, ...]` — ouroboros1, Oct 14 '22 at 08:02

score 0 · Answer 2 · answered Oct 14 '22 at 08:13

Your calculations are vectorized and not iterative and therefore the relations between the rows are not based on the previous calculations.

For the input:

       T      Ax         Ay      Az      delta T           Vx     Vy       Vz
0   0.007720    0.16    8.00    3.83    0.007720    0.001235    0.06176 0.029568
1   0.016678    0.28    7.90    3.83    0.008958    0.000000    0.00000 0.000000
2   0.024631    0.28    7.87    3.79    0.007953    0.000000    0.00000 0.000000
3   0.032585    0.44    7.87    3.76    0.007954    0.000000    0.00000 0.000000
4   0.040531    0.57    7.90    3.76    0.007946    0.000000    0.00000 0.000000

If you would run df_yt["Vx"].shift(1), you will get:

0         NaN
1    0.001235
2    0.000000
3    0.000000
4    0.000000

Therefore you calculation for Vx, is actually:

Based on the post here: Is there a way in Pandas to use previous row value in dataframe.apply when previous value is also calculated in the apply?

Based on the post above, I would suggest:

for i in range(1, len(df_yt)):
    df_yt.loc[i, 'Vx'] = df_yt.loc[i-1, 'Vx'] + df_yt.loc[i, 'Ax']*df_yt.loc[i, 'delta T']
    df_yt.loc[i, 'Vy'] = df_yt.loc[i-1, 'Vy'] + df_yt.loc[i, 'Ay']*df_yt.loc[i, 'delta T']
    df_yt.loc[i, 'Vz'] = df_yt.loc[i-1, 'Vz'] + df_yt.loc[i, 'Az']*df_yt.loc[i, 'delta T']

Output:

       T         Ax      Ay      Az      delta T      Vx           Vy    Vz
0   0.007720    0.16    8.00    3.83    0.007720    0.001235    0.061760    0.029568
1   0.016678    0.28    7.90    3.83    0.008958    0.003743    0.132528    0.063877
2   0.024631    0.28    7.87    3.79    0.007953    0.005970    0.195118    0.094019
3   0.032585    0.44    7.87    3.76    0.007954    0.009470    0.257716    0.123926
4   0.040531    0.57    7.90    3.76    0.007946    0.013999    0.320490    0.153803

I know it's not vectorize

Hope it helps

Hello, I tried that and I got a MemoryError after waiting for 30 minutes for it to run ahahah because this df has more than a million of rows https://stackoverflow.com/questions/74020604/adding-values-in-a-column-by-formula-in-a-pandas-dataframe/74020773?noredirect=1#comment130696859_74020773 — Francisco Barroca, Oct 14 '22 at 17:08

Pandas "Formulas" not working as expected

2 Answers2