0

I'm trying to calculate what I am calling "delta values", meaning the amount that has changed between two consecutive rows.

For example

A  | delta_A
1  | 0
2  | 1
5  | 3
9  | 4

I managed to do that starting with this code (basically copied from a MatLab program I had)

df = df.assign(delta_A=np.zeros(len(df.A)))
df['delta_A'][0] = 0  # start at 'no-change'
df['delta_A'][1:] = df.A[1:].values - df.A[:-1].values

Which generates the dataframe correctly, and seems to have no further negative affects

However, I think there is something wrong with that approach becuase I get these messages.

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy .../__main__.py:5: SettingWithCopyWarning

So, I didn't really understand what that link was trying to say, and I found this post

Adding new column to existing DataFrame in Python pandas

And, as the latest edit to the answer there says to use this code, but I have already used that syntax...

 df1 = df1.assign(e=p.Series(np.random.randn(sLength)).values)

So, question is - Is the loc() function the way to go, or what is the more correct way to get that column?

Community
  • 1
  • 1
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • This is interesting. I came across this error too and I am yet to see a standard way to create a new column referencing the existing ones using the new `.loc`. Most people only suggest suppressing the warning or about the false positives which can be ignored. This still needs some clarity. – benSooraj Mar 13 '17 at 08:38

1 Answers1

3

It seems you need diff and then replace NaN with 0:

df['delta_A'] = df.A.diff().fillna(0).astype(int)

   A  delta_A
0  0        0
1  4        4
2  7        3
3  8        1

Alternative solution with assign

df = df.assign(delta_A=df.A.diff().fillna(0).astype(int))

   A  delta_A
0  0        0
1  4        4
2  7        3
3  8        1

Another solution if you need to replace only first NaN value:

df['delta_A'] = df.A.diff()
df.loc[df.index[0], 'delta_A'] = 0

print (df)
   A  delta_A
0  0      0.0
1  4      4.0
2  7      3.0
3  8      1.0

Your solution can be modified with iloc, but I think it's better to use the diff function:

df['delta_A'] = 0  # convert all values to 0
df['delta_A'].iloc[1:] = df.A[1:].values - df.A[:-1].values
#also works
#df['delta_A'][1:] = df.A[1:].values - df.A[:-1].values
print (df)
   A  delta_A
0  0        0
1  4        4
2  7        3
3  8        1
benSooraj
  • 447
  • 5
  • 18
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Cool, follow-up. Is there really any difference in `df['col'] =` and `df = df.assign(col=` syntax? – OneCricketeer Mar 13 '17 at 08:49
  • 1
    It have same output, only assign have some limitation. But it is solution for `method chaining` - you can check this perfect [tutorial](http://tomaugspurger.github.io/method-chaining.html) - (second in [modern pandas](http://pandas.pydata.org/pandas-docs/stable/tutorials.html#modern-pandas)) – jezrael Mar 13 '17 at 08:52