0

I have a DataFrame with a column 'col1' with integers in it. The DF may have anything from 100 up to 1mln rows. How to compute difference between pair of values in the col1 such as:

row2 - row1 row3 - row2 row4 - row3 etc

and return max difference?

I know how to use loc, iloc but do not know how to force it to go through pair of values and move to next pair

2Obe
  • 3,570
  • 6
  • 30
  • 54

1 Answers1

1
max(df[col_name].shift(-1)-df[col_name])

The function shift takes the value of the next row (or second next row if you take do shift(-2)). By doing df[col_name].shift(-1), you take for a certain row, the value which is in the row below it. Substracting the value from the current from the value from df[col_name].shift(-1) gives you per row the difference between rows. So in a dataframe you'll end up with series of difference between rows. Take the max and you get the max.

Example below col_1 == Original column, col_2 == df[col_1].shift(-1)


> col_1 |  col_2 
> 123 | 456
> 456| 999
> 999| nan

Now you just substract col_1 from col_2, take the max and get the max difference.

5nv
  • 441
  • 2
  • 15
  • Please explain how your answer solves the problem. – Haris Sep 15 '17 at 10:15
  • The function shift takes the value of the next row (or second next row if you take do shift(-2)). By doing df[col_name].shift(-1), you take for a certain row, the value which is in the row below it. Substracting the value from the current from the value from df[col_name].shift(-1) gives you per row the difference between rows. So in a dataframe you'll end up with series of difference between rows. Take the max and you get the max. – 5nv Sep 15 '17 at 12:59
  • Thats good. I think its better if you edit your answer and add this explanation there. – Haris Sep 15 '17 at 13:00
  • I'll remember it, thanks for the advice. – 5nv Sep 15 '17 at 13:04