First of all, only use DataFrame.iterrows()
as a last resort. DataFrames are optimized for vectorized operations on entire columns at once, not for row-by-row operations. And if you must iterate, consider using DataFrame.itertuples()
instead because it preserves the data type of each column and runs much, much faster.
Second, it is important in Pandas (and all of computing, really) to structure your data appropriately for the task at hand. Your current solution has persons along the index and time points as the columns. That makes for a wide, ragged matrix with potentially many NaNs, as your example shows. It sounds like you want to store four elements of data for each cell of your DataFrame: person, time, x, and y. Consider using four columns instead of one column per time point, like so:
import pandas as pd
inp = [[(11,110), (12,120)],
[(13,130), (14,140), (15,150)]]
df = pd.DataFrame(inp) # ragged and wide--not ideal for Pandas
df2 = df.stack() # now each element is indexed by a MultiIndex (person and time).
df2.index.rename(["person", "time"], inplace=True) # to be explicit
df3 = pd.DataFrame(df2.tolist(), index=df2.index) # now each row is a person/time and there are two columns for x and y
df3.reset_index(inplace=True) # not strictly necessary
df3.rename(columns={0: "x", 1: "y"}, inplace=True) # to be explicit
for row in df3.itertuples(): # using itertuples instead of iterrows
print(row)
# Pandas(Index=0, person=0, time=0, x=11, y=110)
# Pandas(Index=1, person=0, time=1, x=12, y=120)
# Pandas(Index=2, person=1, time=0, x=13, y=130)
# Pandas(Index=3, person=1, time=1, x=14, y=140)
# Pandas(Index=4, person=1, time=2, x=15, y=150)
You should take a look at this answer for how I split the tuples. Of course, if you have the ability to control how the data are being constructed, you do not need to do this kind of manipulation--just create the DataFrame with the appropriate structure in the first place.
Now you can treat df3["x"]
and df3["y"]
as pandas.Series
objects for whatever you need to do:
for x in df3["x"]:
print(x)
# 11
# 12
# 13
# 14
# 15
for y in df3["y"]:
print(y)
# 110
# 120
# 130
# 140
# 150
print(df3["x"] * df3["y"]/5 + 1)
# 0 243.0
# 1 289.0
# 2 339.0
# 3 393.0
# 4 451.0
# dtype: float64