Python: any lazy method for reading .xls files?

Question

I know how to read .xls files with pandas. However, it returns all the data. I want to load data on demand, I mean, I want a generator that returns the next row each time is iterated. See this question for general files.

I know openpyxl can do this, following this webpage. However, it doesn't support old .xls files. It recommends me to use xlrd, however, I don't know how to do what I want with that package.

The documentation tells how to do that sheet by sheet, but not row by row (my file has only one sheet).

A pandas DataFrame has a built-in generator called *iterrows()* which is probably what you need — DarkKnight, Sep 17 '22 at 10:55
I checked with my data, and the `xlrd.open_workbook` output occupies 48 bytes, while the `pandas.read_excel` output takes 5,361 bytes. The test excel file is 32,256 bytes. I'm still wondering if `xlrd` is already doing a "lazy reading" by the things I need to acces data. But I would use `xlrd` seeing the sizes. — Abel Gutiérrez, Sep 17 '22 at 15:34

score 2 · Answer 1 · 2022-09-17T11:19:21.187

2

Pandas doesn't support lazy loading, it reads the file and keeps everything in memory.

Polars -- an alternative to pandas -- supports lazy loading.
Unfortunately this isn't yet implemented for xls files.

One solution is to convert the excel file to csv and use the scan_csv function.

import polars as pl
pl.scan_csv("sample.csv")
<polars.internals.lazyframe.frame.LazyFrame object at 0x7f0ae95d1c00>

edited Sep 17 '22 at 11:19

answered Sep 17 '22 at 11:13

That's a solution, although I don't know if it's worth it. I mean, I don't want to store the `.csv` file, so the algorithm would be like write-read-delete and the file would use some space in the disk. Although this isn't a problem for my data. – Abel Gutiérrez Sep 17 '22 at 15:28

score 0 · Answer 2 · edited Mar 27 '23 at 10:48

0

You can convert Dataframe to LazyFrame:

import polars as pl
df = dflazy.lazy()
dflazy

edited Mar 27 '23 at 10:48

Andreas Violaris

2,465
5
13
26

answered Mar 21 '23 at 19:30

Projeto Julius

1

Python: any lazy method for reading .xls files?

2 Answers2