2

Is there a way to convert a series from a dask dataframe to a list, in order to iterate over that?

Until now I have:

ddf = dd.read_csv(MY_FILE)
s = ddf.iloc[:,[0]]
r = s.compute()
r.a_column.values

Thanks!

2 Answers2

0

how about using inline for sentence? You can make new iterable object

you can get the values of Dataframe using values attribute.

ddf = dd.read_csv(MY_FILE)
s = ddf.iloc[:,[0]]
r = s.compute()
print([i[0] for i in r.values])
SEUNGFWANI
  • 140
  • 10
  • I got that. But, is there a way to take a Dask Series and transform that into a list of values, without doing compute previously? – Cris Hernandez Dec 21 '21 at 05:02
  • In my knowledge, the dask dataframe is lazy operation. it means the dask dataframe has schema but no data, itself. So if you want to get data from file(etc.), then you should use the `compute()` function that is action(load and execute) function. – SEUNGFWANI Dec 21 '21 at 06:13
  • I understand. Thanks mate. I think need to dig deeper in documentation. – Cris Hernandez Dec 21 '21 at 14:02
0

In general, it's preferable to avoid iterating over rows whenever possible (and use vectorized operations instead), see here. However, if the operations performed on elements of the row are independent of neighbouring rows, then the easiest thing to do in dask is .map_partition:

def myfunc(df):
    # apply row operations assuming df is a pandas df
    for index, row in df.iterrows():
        # do something
        something = 'some_value'
    return something

r = ddf.map_partitions(myfunc)
SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46
  • I created a list to append the values inside the for loop, aimed to return that list. But, how could I get the values of that list which is representing a dask object? – Cris Hernandez Dec 21 '21 at 05:04
  • Hmmm, how are you going to use the list afterwards? (is it going to be a parallel operation also or not?) – SultanOrazbayev Dec 21 '21 at 05:07