83

In python, how can I reference previous row and calculate something against it? Specifically, I am working with dataframes in pandas - I have a data frame full of stock price information that looks like this:

           Date   Close  Adj Close
251  2011-01-03  147.48     143.25
250  2011-01-04  147.64     143.41
249  2011-01-05  147.05     142.83
248  2011-01-06  148.66     144.40
247  2011-01-07  147.93     143.69

Here is how I created this dataframe:

import pandas

url = 'http://ichart.finance.yahoo.com/table.csv?s=IBM&a=00&b=1&c=2011&d=11&e=31&f=2011&g=d&ignore=.csv'
data = data = pandas.read_csv(url)

## now I sorted the data frame ascending by date 
data = data.sort(columns='Date')

Starting with row number 2, or in this case, I guess it's 250 (PS - is that the index?), I want to calculate the difference between 2011-01-03 and 2011-01-04, for every entry in this dataframe. I believe the appropriate way is to write a function that takes the current row, then figures out the previous row, and calculates the difference between them, the use the pandas apply function to update the dataframe with the value.

Is that the right approach? If so, should I be using the index to determine the difference? (note - I'm still in python beginner mode, so index may not be the right term, nor even the correct way to implement this)

mikebmassey
  • 8,354
  • 26
  • 70
  • 95
  • wouldn't be enough to just find the rows for the two dates and calculate the difference between them..? – redShadow Oct 29 '12 at 00:32
  • @redShadow interesting thought. I hadn't thought about it like that. The only issue I see is how would you handle weekends? There are 252 trading days, not 365. If you try and go up 1 day, and it's a weekend, the function would fail. Although, I guess you could put some error handling in the function... – mikebmassey Oct 29 '12 at 00:34
  • No, wait.. what exactly are you trying to accomplish? Calculate, for each day, the difference with the previous one, for all the records in the dataset..? – redShadow Oct 29 '12 at 00:37
  • @redShadow exactly. Line by line, the difference from the previous day. first row will be 0, so I would have to figure out some error handling, but... – mikebmassey Oct 29 '12 at 00:39

3 Answers3

129

I think you want to do something like this:

In [26]: data
Out[26]: 
           Date   Close  Adj Close
251  2011-01-03  147.48     143.25
250  2011-01-04  147.64     143.41
249  2011-01-05  147.05     142.83
248  2011-01-06  148.66     144.40
247  2011-01-07  147.93     143.69

In [27]: data.set_index('Date').diff()
Out[27]: 
            Close  Adj Close
Date                        
2011-01-03    NaN        NaN
2011-01-04   0.16       0.16
2011-01-05  -0.59      -0.58
2011-01-06   1.61       1.57
2011-01-07  -0.73      -0.71
Chang She
  • 16,692
  • 8
  • 40
  • 25
  • 4
    Great but how would you add this is as a new column in data? – greenafrican Nov 07 '13 at 12:41
  • @Chang She, what if I didnt want the difference but the actual value. i.e. instead of diff(1) is there something like value(1) or value(1:3).mean(). The second made up example would get the mean of the next, second and third value. That would be useful – IcemanBerlin Jan 03 '14 at 16:35
  • 1
    .shift() in pandas gives you the values. – Sean W. Feb 07 '14 at 19:37
  • Thanks for the answer. For those who want the n-day difference, it is worth mentioning that you can set the input parameter `periods` to n. Check the `pandas.DataFrame.diff` documentation for more details: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.diff.html . – JejeBelfort May 05 '17 at 10:07
  • 1
    Answering the followup question by greenafrica. You can combine the original data with the new difference data using pd.concat, as for example: `pd.concat([data,data[['Close', 'Adj Close']].diff().rename({'Close':'Close Diff', 'Adj Close':'Adj Close Diff'}, axis=1)], axis=1)` – Uwe Mayer Sep 16 '20 at 21:24
  • 1
    If you want to save the difference in another column, you can: `df['close_diff'] = df['Close'].diff()` – Save Jan 03 '22 at 21:16
28

To calculate difference of one column. Here is what you can do.

df=
      A      B
0     10     56
1     45     48
2     26     48
3     32     65

We want to compute row difference in A only and want to consider the rows which are less than 15.

df['A_dif'] = df['A'].diff()
df=
          A      B      A_dif
    0     10     56      Nan
    1     45     48      35
    2     26     48      19
    3     32     65      6
df = df[df['A_dif']<15]

df=
          A      B      A_dif
    0     10     56      Nan
    3     32     65      6
Msquare
  • 775
  • 7
  • 17
1

I don't know pandas, and I'm pretty sure it has something specific for this; however, I'll give you the pure-Python solution, that might be of some help even if you need to use pandas:

import csv
import urllib

# This basically retrieves the CSV files and loads it in a list, converting
# All numeric values to floats
url='http://ichart.finance.yahoo.com/table.csv?s=IBM&a=00&b=1&c=2011&d=11&e=31&f=2011&g=d&ignore=.csv'
reader = csv.reader(urllib.urlopen(url), delimiter=',')
# We sort the output list so the records are ordered by date
cleaned = sorted([[r[0]] + map(float, r[1:]) for r in list(reader)[1:]])

for i, row in enumerate(cleaned):  # enumerate() yields two-tuples: (<id>, <item>)
    # The try..except here is to skip the IndexError for line 0
    try:
        # This will calculate difference of each numeric field with the same field
        # in the row before this one
        print row[0], [(row[j] - cleaned[i-1][j]) for j in range(1, 7)]
    except IndexError:
        pass
redShadow
  • 6,687
  • 2
  • 31
  • 34
  • 5
    Downvoting this because, in addition to reproducing what pandas do, it is done with nested for loops, which is slow in Python. Pandas will do it at the C level. – jonathanrocher Apr 09 '15 at 21:03
  • 7
    downvoting as original question has tag ```pandas``` which directly points to way author want to solve his problem – ruX Nov 14 '16 at 09:55
  • 2
    Sometimes you won't know you want something until it is presented to you :) – ilyas Aug 02 '18 at 19:25
  • The header says in `Python / Pandas`. That was perhaps misleading as well, at least a reason to start the answer at all. In `Python pandas` or `pandas (Python)` would have been better. – questionto42 Mar 04 '22 at 22:39