27

Performing .shape is giving me the following error.

AttributeError: 'DataFrame' object has no attribute 'shape'

How should I get the shape instead?

user1559897
  • 1,454
  • 2
  • 14
  • 27

6 Answers6

36

You can get the number of columns directly

len(df.columns)  # this is fast

You can also call len on the dataframe itself, though beware that this will trigger a computation.

len(df)  # this requires a full scan of the data

Dask.dataframe doesn't know how many records are in your data without first reading through all of it.

MRocklin
  • 55,641
  • 23
  • 163
  • 235
  • len(df) is loading all of the records and in my case, finding len(df) for a table at size 144M rows took more than few minutes (wind10,ram16,intel7). Any other way? – Rebin Mar 19 '19 at 21:03
  • It probably has to load all of the data to find out the length. No, there is no other way. You could consider using something like a database, which tracks this sort of information in metadata. – MRocklin Mar 27 '19 at 05:09
  • 2
    i've been doing `df.index.size.compute()` which is faster than running `len(df)` ... but my data is stored in columnar parquet... so it depends on what your underlying data architecture is. – user108569 Aug 22 '19 at 19:41
27

With shape you can do the following

a = df.shape
a[0].compute(),a[1]

This will shop the shape just as it is shown with pandas

tinashe matambo
  • 271
  • 3
  • 2
7

Well, I know this is a quite old question, but I had the same issue and I got an out-of-the-box solution which I just want to register here.

Considering your data, I'm wondering that it is originally saved in a CSV similar file; so, for my situation, I just count the lines of that file (minus one, the header line). Inspired by this answer here, this is the solution I'm using:

import dask.dataframe as dd
from itertools import (takewhile,repeat)
 
def rawincount(filename):
    f = open(filename, 'rb')
    bufgen = takewhile(lambda x: x, (f.raw.read(1024*1024) for _ in repeat(None)))
    return sum( buf.count(b'\n') for buf in bufgen )

filename = 'myHugeDataframe.csv'
df = dd.read_csv(filename)
df_shape = (rawincount(filename) - 1, len(df.columns))
print(f"Shape: {df_shape}")

Hope this could help someone else as well.

ti7
  • 16,375
  • 6
  • 40
  • 68
iperetta
  • 607
  • 10
  • 19
3
print('(',len(df),',',len(df.columns),')')
Omid Erfanmanesh
  • 547
  • 1
  • 7
  • 29
1

To get the shape we can try this way:

 dask_dataframe.describe().compute()  

"count" column of the index will give the number of rows

 len(dask_dataframe.columns) 

this will give the number of columns in the dataframe

-2

Getting number of columns by below code.

import dask.dataframe as dd
dd1=dd.read_csv("filename.txt")
print(dd1.info)

#Output
<class 'dask.dataframe.core.DataFrame'>
Columns: 6 entries, CountryName to Value
dtypes: object(4), float64(1), int64(1)
sameer_nubia
  • 721
  • 8
  • 8