10

Was wondering about the size of particular polars DataFrames. I tried with:

from sys import getsizeof

getsizeof(df)
Out[17]: 48
getsizeof(df.to_pandas())
Out[18]: 1602923950

It appears all polars df are 48 bytes? Confused.

fvg
  • 153
  • 3
  • 9

2 Answers2

10

Using the polars.DataFrame.estimated_size() method we can get the size of the dataframe similar to pandas.info().

Follow the link.

4b0
  • 21,981
  • 30
  • 95
  • 142
RKCH
  • 219
  • 3
  • 9
7

The Python package polars is only a wrapper for the underlying core polars library written in Rust. So I'm pretty sure what you're seeing when you call getsizeof on the DataFrame is the getsizeof result for the Python object implementing that type in the polars Python package (at the wrapper layer).

With pandas the df.info() function will include memory usage. I was actually looking for this in polars as well.

I noticed there are individual functions for getting the null count and the schema (#2492), but I couldn't track down a way to print a DataFrame's memory usage from a polars implementation.

I'll bump this question in the discord. This should be doable to implement if I'm not over-simplifying it.

cnpryer
  • 195
  • 1
  • 1
  • 7
  • Potter420 on the discord server reminded me of a great point. The underlying `arrow` format (`arrow2` here) would make it difficult to do this with complete accuracy but this topic does have visibility it seems https://github.com/jorgecarleitao/arrow2/issues/421 – cnpryer Apr 10 '22 at 20:29
  • 1
    I've submitted an issue for this https://github.com/pola-rs/polars/issues/3106 – cnpryer Apr 10 '22 at 23:21