48

Suppose I have pandas dataframe as:

df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})

When I convert it into dask dataframe what should name and divisions parameter consist of:

from dask import dataframe as dd 
sd=dd.DataFrame(df.to_dict(),divisions=1,meta=pd.DataFrame(columns=df.columns,index=df.index))

TypeError: init() missing 1 required positional argument: 'name'

Edit : Suppose I create a pandas dataframe like:

pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})

Similarly how to create dask dataframe as it needs three additional arguments as name,divisions and meta.

sd=dd.Dataframe({'a':[1,2,3],'b':[4,5,6]},name=,meta=,divisions=)

Thank you for your reply.

rey
  • 1,213
  • 3
  • 11
  • 14

1 Answers1

71

I think you can use dask.dataframe.from_pandas:

from dask import dataframe as dd 
sd = dd.from_pandas(df, npartitions=3)
print (sd)
dd.DataFrame<from_pa..., npartitions=2, divisions=(0, 1, 2)>

EDIT:

I find solution:

import pandas as pd
import dask.dataframe as dd
from dask.dataframe.utils import make_meta

df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})

dsk = {('x', 0): df}

meta = make_meta({'a': 'i8', 'b': 'i8'}, index=pd.Index([], 'i8'))
d = dd.DataFrame(dsk, name='x', meta=meta, divisions=[0, 1, 2])
print (d)
dd.DataFrame<x, npartitions=2, divisions=(0, 1, 2)>
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks for the reply but I want to what is name and divisions parameter, while creating dask dataframe.I have gone through the documentation but couldn't understand. – rey Sep 27 '16 at 10:44
  • I am not `dask` expert, but I think you need [rom-raw-dask-graphs](http://dask.pydata.org/en/latest/dataframe-create.html#from-raw-dask-graphs).But I think [author of dask](http://stackoverflow.com/users/616616/mrocklin) explain more. – jezrael Sep 27 '16 at 10:48
  • 1
    Thank you I'll try to figure it out and wait for other answers. – rey Sep 27 '16 at 10:53
  • 1
    @jezrael is correct. You should create a Dask.DataFrame using the from-pandas method. You only need to use the constructor in advanced situations – MRocklin Sep 27 '16 at 11:38
  • @MRocklin I got it but creation of dataframe in pandas is easy as mentioned in `edit` but similarly how to create a simple dataframe directly not from pandas.I asked question for pandas so @jezrael is correct but I just wanted to know creating a sample dataframe directly. – rey Sep 27 '16 at 14:46
  • 1
    I agree, this would be interesting to know. – Arco Bast Sep 27 '16 at 22:00
  • @MRocklin - I add solution, can you check it? Thank you. – jezrael Sep 28 '16 at 06:07
  • 1
    @rey - I find solution, please check it. – jezrael Sep 28 '16 at 06:08
  • @jezrael thanks for adding solution.I had searched through github dask but couldn't find it. – rey Sep 30 '16 at 05:54