Is there an equivalent of an excel formula in pandas

Question

I have a time-series dataset and I need to find the difference in my field for every tick.

In excel, I can accomplish this easily by writing a formula B2=A2-A1 and copying this formula down the entire B column

example:

Is there a straightforward way to do this with pandas?

I have considered apply() but looks like it can only work on 1 element of a series

My other option will be to create an Numpy array out of my column A and run a loop through it. However, that seems like a roundabout way and gets even more complicated if I need to refer elements from different

The actual Excel formula that I need to translate is =IF((A3-A2)>0, (A3-A2), A3+(4294967296-A2))

Apply will work. See this answer: https://stackoverflow.com/questions/26886653/pandas-create-new-column-based-on-values-from-other-columns — Cole Howard, Apr 26 '18 at 04:00
This answer explains a way to look at multiple columns one element at a time. I want to look at the same single column but two elements at a time. Am I missing something? — user2251346, Apr 26 '18 at 06:44

score 2 · Accepted Answer · answered Apr 26 '18 at 17:15

2

This may help. I haven't tested: for this, please provide some data (in text form) with desired output.

df[1] = np.where(df[0].diff() > 0, df[0].diff(), df[0].shift() + 2**32 - df[0])

answered Apr 26 '18 at 17:15

jpp

159,742
34
281
339

1

I tested your line. Works exactly as I desired. – user2251346 Apr 27 '18 at 15:49

WillMonge · Answer 2 · 2018-04-26T18:00:37.020

1

To obtain the lagged difference of a series use

df['my_column'].diff()

You can specify the lag as well, if you wanted to do something different than 1.

See the documentation for more info

EDIT Here is a possible way to address your formula:

df = pd.DataFrame({'A': [20, 22, 25, 43, 23, 45, 67, 50, 70]})
df['result'] = df['A'].diff()
df['result'][ df['result'] < 0 ] += 2**32  # you can disregard the pandas warning

Based on your logic you can simply take the diff and if the result is less than zero then add the constant 2**32. And it should be a little more efficient than taking the diff a few times (although probably this is not a big concern).

edited Apr 26 '18 at 18:00

answered Apr 26 '18 at 04:00

WillMonge

1,005
8
19

That's great! However my real need is a little more complicated. I am trying to count bytes sent over a tcp connection every minute by calculating the difference in the seq number. The sequence number could roll over. So my excel formula really looked like =IF((A3-A2)>0, (A3-A2), A3+(4294967296-A2)) – user2251346 Apr 26 '18 at 06:03
1

@user2251346 see my edit for addressing your following formula. I believe it should be fairly more readable and efficient than the `np.where` approach – WillMonge Apr 26 '18 at 18:02
Can you explain why you think it will be more efficient? – user2251346 Apr 27 '18 at 15:51
1

@user2251346: Sure, jpp's answer is a good one liner, however it performs 2 diff's and one shift, since it will basically create both arrays for the `np.where` (the when true's and the when false's). If the diff and shift operations were costly, due to for example the Series being too long, then the one-liner would be less efficient. I hope my two lines of code are also more readable, but that is more subjective. Hope that was helpful! – WillMonge Apr 27 '18 at 18:40
Got it. Thanks! – user2251346 Apr 27 '18 at 21:45

user2251346 · Answer 3 · 2018-04-26T17:32:31.827

0

This is how I solved my problem for now using numpy

(translating Excel formula =IF((A3-A2)>0, (A3-A2), A3+(2**32-A2)))

s1=np.array(df[0])
s=s1[1:]-s1[0:-1]
s[s<0]=s[s<0]+(2**32)

I do feel there can be a more elegant solution for this natively in pandas. May be if df.diff() could take a lambda parameter?

edited Apr 26 '18 at 17:32

answered Apr 26 '18 at 16:50

user2251346

620
5
14

Is there an equivalent of an excel formula in pandas

3 Answers3