1

My question is similar to this thread Create dummies from column with multiple values in pandas

Objective: I would like to produce similar result below but using dask

In Pandas

import pandas as pd
df = pd.DataFrame({'fruit': ['Banana, , Apple, Dragon Fruit,,,', 'Kiwi,', 'Lemon, Apple, Banana', ',']})
df['fruit'].str.get_dummies(sep=',')

Which will output the following:

          Apple  Banana Dragon Fruit    Banana  Kiwi    Lemon
0     1      1        0            1         1     0        0
1     0      0        0            0         0     1        0
2     0      1        1            0         0     0        1
3     0      0        0            0         0     0        0

get_dummies() above is of type <pandas.core.strings.StringMethods>

Now the problem is there is no get_dummies() for dask equivalent <dask.dataframe.accessor.StringAccessor>

How can I solve my problem using dask?

anwari
  • 13
  • 3

1 Answers1

0

Apparently this is not possible in dask as we wouldn't know the output columns before hand. See https://github.com/dask/dask/issues/4403.