Is there a way to convert a series from a dask dataframe to a list, in order to iterate over that?
Until now I have:
ddf = dd.read_csv(MY_FILE)
s = ddf.iloc[:,[0]]
r = s.compute()
r.a_column.values
Thanks!
Is there a way to convert a series from a dask dataframe to a list, in order to iterate over that?
Until now I have:
ddf = dd.read_csv(MY_FILE)
s = ddf.iloc[:,[0]]
r = s.compute()
r.a_column.values
Thanks!
how about using inline for
sentence? You can make new iterable object
you can get the values of Dataframe using values
attribute.
ddf = dd.read_csv(MY_FILE)
s = ddf.iloc[:,[0]]
r = s.compute()
print([i[0] for i in r.values])
In general, it's preferable to avoid iterating over rows whenever possible (and use vectorized operations instead), see here. However, if the operations performed on elements of the row are independent of neighbouring rows, then the easiest thing to do in dask
is .map_partition
:
def myfunc(df):
# apply row operations assuming df is a pandas df
for index, row in df.iterrows():
# do something
something = 'some_value'
return something
r = ddf.map_partitions(myfunc)