Can't drop columns or slice dataframe using dask?

Question

I am trying to use dask instead of pandas since I have 2.6gb csv file. I load it and I want to drop a column. but it seems that neither the drop method df.drop('column') or slicing df[ : , :-1]

is implemented yet. Is this the case or am I just missing something ?

MRocklin · Accepted Answer · 2015-08-18T17:19:56.320

9

We implemented the drop method in this PR. This is available as of dask 0.7.0.

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'x': [1, 2, 3], 'y': [3, 2, 1]})

In [3]: import dask.dataframe as dd

In [4]: ddf = dd.from_pandas(df, npartitions=2)

In [5]: ddf.drop('y', axis=1).compute()
Out[5]: 
   x
0  1
1  2
2  3

Previously one could also have used slicing with column names; though of course this can be less attractive if you have many columns.

In [6]: ddf[['x']].compute()
Out[6]: 
   x
0  1
1  2
2  3

edited Aug 18 '15 at 17:19

answered Aug 07 '15 at 04:41

MRocklin

55,641
23
163
235

Why ".compute()"? If your database is very large, doesn't this slow you down?? – FaCoffee Oct 28 '17 at 15:59
1

I only use compute above to show results of the computation. You're correct that calling compute prematurely can be suboptimal. – MRocklin Oct 29 '17 at 18:28

score 0 · Answer 2 · answered Aug 24 '21 at 11:28

0

This should work:

print(ddf.shape)
ddf = ddf.drop(columns, axis=1)
print(ddf.shape)

answered Aug 24 '21 at 11:28

Fares Sayah

121
1
5

Can't drop columns or slice dataframe using dask?

2 Answers2

Linked

Related