0

Im facing a problem with applying a function to a DataFrame (to model a solar collector based on annual hourly weather data)

Suppose I have the following (simplified) DataFrame:

df2:
    A   B  C
0  11  13  5
1   6   7  4
2   8   3  6
3   4   8  7
4   0   1  7

Now I have defined a function that takes all rows as input to create a new column called D, but I want the function to also take the last calculated value of D (except of course for the first row as no value for D is calculated) as input.

def Funct(x):
    D = x['A']+x['B']+x['C']+(x-1)['D']

I know that the function above is not working, but it gives an idea of what I want.

So to summarise:

Create a function that creates a new column in the dataframe and takes the value of the new column one row above it as input

Can somebody help me?

Thanks in advance.

2 Answers2

1

It sounds like you are calculating a cumulative sum. In that case, use cumsum:

In [45]: df['D'] = (df['A']+df['B']+df['C']).cumsum()

In [46]: df
Out[46]: 
    A   B  C   D
0  11  13  5  29
1   6   7  4  46
2   8   3  6  63
3   4   8  7  82
4   0   1  7  90

[5 rows x 4 columns]
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Thanks for quick response, but in realtime the function is way more complex, and has to be a definition of a function. Is there also a more uniform way to do this for every defined function? – user3497877 Apr 04 '14 at 12:19
  • If the values are defined by an *arbitrary* [recurrence relation](http://en.wikipedia.org/wiki/Recurrence_relation), then you would either need to find a close-form solution for the recurrence relation, or compute the values iteratively. There is no pandas-specific way to do that (except I think for `cumsum`). You would just need to use a `for-loop` and compute them iteratively. – unutbu Apr 04 '14 at 12:24
  • If after writing the `for-loop` you find (through [profiling](http://www.huyng.com/posts/python-performance-analysis/)) that this is a performance bottleneck, one relatively simple way to boost speed would be to [use Cython](http://pandas.pydata.org/pandas-docs/dev/enhancingperf.html). – unutbu Apr 04 '14 at 12:49
0

Are you looking for this?
You can use shift to align the previous row with current row and then you can do your operation.

In [7]: df
Out[7]:
   a  b
1  1  1
2  2  2
3  3  3
4  4  4

[4 rows x 2 columns]

In [8]: df['c'] = df['b'].shift(1) #First row will be Nan

In [9]: df
Out[9]:
   a  b   c
1  1  1 NaN
2  2  2   1
3  3  3   2
4  4  4   3

[4 rows x 3 columns]
Shravan
  • 2,553
  • 2
  • 16
  • 19