I have two versions of a function that uses Pandas
for Python 2.7
to go through inputs.csv
, row by row.
The first version uses Series.apply()
on a single column
, and goes through each row as intended.
The second version uses DataFrame.apply()
on multiple columns
, and for some reason it reads the top row twice. It then goes on to execute the rest of the rows without duplicates.
Any ideas why the latter reads the top row twice?
Version #1 – Series.apply()
(Reads top row once)
import pandas as pd
df = pd.read_csv(inputs.csv, delimiter=",")
def v1(x):
y = x
return pd.Series(y)
df["Y"] = df["X"].apply(v1)
Version #2 – DataFrame.apply()
(Reads top row twice)
import pandas as pd
df = pd.read_csv(inputs.csv, delimiter=",")
def v2(f):
y = f["X"]
return pd.Series(y)
df["Y"] = df[(["X", "Z"])].apply(v2, axis=1)
print y
:
v1(x): v2(f):
Row_1 Row_1
Row_2 Row_1
Row_3 Row_2
Row_3