I have a dataframe with 1 million rows. I have a single function (which I can't vectorize) to apply to each row. I looked into swifter which promises to leverage multiple process to speed up computations. On an 8-cores machine it's however not the case.
Any idea why?
def parse_row(n_print=None):
def f(row):
if n_print is not None and row.name % n_print == 0:
print(row.name, end="\r")
return Feature(
geometry=Point((float(row["longitude"]), float(row["latitude"]))),
properties={
"water_level": float(row["water_level"]),
"return_period": float(row["return_period"])
}
)
return f
In [12]: df["feature"] = df.swifter.apply(parse_row(), axis=1)
Dask Apply: 100%|████████████████████████████████████████| 48/48 [01:19<00:00, 1.65s/it]
In [13]: t = time(); df["feature"] = df.apply(parse_row(), axis=1); print(int(time() - t))
46