A recursive function is difficult to vectorize because each input at time t depends on the previous input at time t-1.
[Question updated below with slightly more complex example x_t = a x_{t-1} + b.]
Issue with .loc returning different data types
import pandas
df1 = pandas.DataFrame({'year':range(2020,2024),'a':range(3,7)})
# Set the initial value
t0 = min(df1.year)
df1.loc[df1.year==t0, "x"] = 0
This assignment doesn't work when the right side of the equation is a pandas.core.series.Series
for t in range (min(df1.year)+1, max(df1.year)+1):
df1.loc[df1.year==t, "x"] = df1.loc[df1.year==t-1,"x"] + df1.loc[df1.year==t-1,"a"]
print(df1)
# year a x
# 0 2020 3 0.0
# 1 2021 4 NaN
# 2 2022 5 NaN
# 3 2023 6 NaN
print(type(df1.loc[df1.year==t-1,"x"] + df1.loc[df1.year==t-1,"a"]))
# <class 'pandas.core.series.Series'>
The assignment works when the right side of the equation is a numpy array
for t in range (min(df1.year)+1, max(df1.year)+1):
df1.loc[df1.year==t, "x"] = (df1.loc[df1.year==t-1,"x"] + df1.loc[df1.year==t-1,"a"]).unique()
#break
print(df1)
# year a x
# 0 2020 3 0.0
# 1 2021 4 3.0
# 2 2022 5 7.0
# 3 2023 6 12.0
print(type((df1.loc[df1.year==t-1,"x"] + df1.loc[df1.year==t-1,"a"]).unique()))
# <class 'numpy.ndarray'>
The assignment works directly when the .loc() selection is using a year index
df2 = df.set_index("year").copy()
# Set the initial value
df2.loc[df2.index.min(), "x"] = 0
for t in range (df2.index.min()+1, df2.index.max()+1):
df2.loc[t, "x"] = df2.loc[t-1, "x"] + df2.loc[t-1,"a"]
#break
print(df2)
# a x
# year
# 2020 3 0.0
# 2021 4 3.0
# 2022 5 7.0
# 2023 6 12.0
print(type(df2.loc[t-1, "x"] + df2.loc[t-1,"a"]))
# <class 'numpy.float64'>
type(df1.loc[df1.year==t-1,"x"] + df1.loc[df1.year==t-1,"a"])
is a pandas series whiletype(df2.loc[t-1, "x"] + df2.loc[t-1,"a"])
is a numpy float. Why are these types different?- If I do not want to use
set_index()
before the computation. Is there a better way to write a recursive.loc()
assignment than to use.unique()
?
See also:
- related Question and Answer on recursive assignment
- related documentation on [Mutating User Defined Function methods](https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#mutating-with-user-defined-function- udf-methods)
Example using multiplicative and additive component
Our real problem is more complicated since there is a multiplicative and an additive component
import pandas
df3 = pandas.DataFrame({'year':range(2020,2024),'a':range(3,7), 'b':range(8,12)})
df3 = df3.set_index("year").copy()
# Set the initial value
df3.loc[df3.index.min(), "x"] = 0
for t in range (df3.index.min()+1, df3.index.max()+1):
df3.loc[t, "x"] = df3.loc[t-1, "x"] * df3.loc[t-1, "a"] + df3.loc[t-1, "b"]
#break
print(df3)