Probably a continuation of this question, working from the dask docs examples for map_partitions.
import dask.dataframe as dd
df = pd.DataFrame({'x': [1, 2, 3, 4, 5], 'y': [1., 2., 3., 4., 5.]})
ddf = dd.from_pandas(df, npartitions=2)
from random import randint
def myadd(df):
new_value = df.x + randint(1,4)
return new_value
res = ddf.map_partitions(lambda df: df.assign(z=myadd)).compute()
res
In the above code, randint is only being called once, not once per row as I would expect. How come?
Output:
X Y Z
1 1 4
2 2 5
3 3 6
4 4 7
5 5 8