5

I have a dictionary like this:

d = {'Caps': 'cap_list', 'Term': 'unique_tokens', 'LocalFreq': 'local_freq_list','CorpusFreq': 'corpus_freq_list'}

I want to create a dask dataframe from it. How do I do it? Normally, in Pandas, is can be easily imported to a Pandas df by:

df = pd.DataFrame({'Caps': cap_list, 'Term': unique_tokens, 'LocalFreq': local_freq_list,
                               'CorpusFreq': corpus_freq_list})

Should I first load into a bag and then convert from bag to ddf?

rpanai
  • 12,515
  • 2
  • 42
  • 64
user1717931
  • 2,419
  • 5
  • 29
  • 40
  • Does this answer your question? [convert dask.bag of dictionaries to dask.dataframe using dask.delayed and pandas.DataFrame](https://stackoverflow.com/questions/55298442/convert-dask-bag-of-dictionaries-to-dask-dataframe-using-dask-delayed-and-pandas) – rpanai Dec 17 '19 at 19:05
  • 1
    It looks to me that you should first use `bag`. – rpanai Dec 17 '19 at 19:06
  • @rpanai I did see the link, but, the role of k (sequence...?) and the whole thing seems convoluted. Wanted a simple example. Plus, I have not used 'delayed' before. I always create the task dag and run compute() finally...which has worked well for me. – user1717931 Dec 17 '19 at 19:41

1 Answers1

6

If your data fits in memory then I encourage you to use Pandas instead of Dask Dataframe.

If for some reason you still want to use Dask dataframe then I would convert things to a Pandas dataframe and then use the dask.dataframe.from_pandas function.

import dask.dataframe as dd
import pandas as pd

df = pd.DataFrame(...)
ddf = dd.from_pandas(df, npartitions=20)

But there are many cases where this will be slower than just using Pandas well.

MRocklin
  • 55,641
  • 23
  • 163
  • 235