Assume I have a DataFrame of the following form where the first column is a random number, and the other columns will be based on the value in the previous column.
For ease of use, let's say I want each number to be the previous one squared. So it would look like the below.
I know I can write a pretty simple loop to do this, but I also know looping is not usually the most efficient in python/pandas. How could this be done with apply()
or rolling_apply()
? Or, otherwise be done more efficiently?
My (failed) attempts below:
In [12]: a = pandas.DataFrame({0:[1,2,3,4,5],1:0,2:0,3:0})
In [13]: a
Out[13]:
0 1 2 3
0 1 0 0 0
1 2 0 0 0
2 3 0 0 0
3 4 0 0 0
4 5 0 0 0
In [14]: a = a.apply(lambda x: x**2)
In [15]: a
Out[15]:
0 1 2 3
0 1 0 0 0
1 4 0 0 0
2 9 0 0 0
3 16 0 0 0
4 25 0 0 0
In [16]: a = pandas.DataFrame({0:[1,2,3,4,5],1:0,2:0,3:0})
In [17]: pandas.rolling_apply(a,1,lambda x: x**2)
C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\spyderlib\widgets\externalshell\start_ipython_kernel.py:1: FutureWarning: pd.rolling_apply is deprecated for DataFrame and will be removed in a future version, replace with
DataFrame.rolling(center=False,window=1).apply(args=<tuple>,kwargs=<dict>,func=<function>)
# -*- coding: utf-8 -*-
Out[17]:
0 1 2 3
0 1.0 0.0 0.0 0.0
1 4.0 0.0 0.0 0.0
2 9.0 0.0 0.0 0.0
3 16.0 0.0 0.0 0.0
4 25.0 0.0 0.0 0.0
In [18]: a = pandas.DataFrame({0:[1,2,3,4,5],1:0,2:0,3:0})
In [19]: a = a[:-1]**2
In [20]: a
Out[20]:
0 1 2 3
0 1 0 0 0
1 4 0 0 0
2 9 0 0 0
3 16 0 0 0
In [21]:
So, my issue is mostly how to refer to the previous column value in my DataFrame calculations.