In Pandas, whats the equivalent of 'nrows' from read_csv() to be used in read_excel()?

Question

Want to import only certain range of data from an excel spreadsheet (.xlsm format as it has macros) into a pandas dataframe. Was doing it this way:

data    = pd.read_excel(filepath, header=0,  skiprows=4, nrows= 20, parse_cols = "A:D")

But it seems that nrows works only with read_csv() ? What would be the equivalent for read_excel()?

Something like `pd.read_excel(...).head(50)` will get you the first 50 rows, but of course it reads and discards, so I'm afraid it's not very helpful. Sorry. — Ami Tavory, Mar 02 '16 at 13:00
`skip_footer`, as in the answer, ought to work (although, assumes you already know n). Alternatively, unless the excel file is large (which usually they aren't else wouldn't be in a spreadsheet), @AmiTavory's suggestion ought to be fine. Finally, `read_excel` is just a wrapper for py/excel libraries (I think `xlrd` by default). If you really want fine control you'll need to use one of those libraries. Here is a good starting point: http://www.python-excel.org/ — JohnE, Mar 02 '16 at 13:31

score 18 · Answer 1 · answered Jun 28 '18 at 18:49

As noted in the documentation, as of pandas version 0.23, this is now a built-in option, and functions almost exactly as the OP stated.

The code

data = pd.read_excel(filepath, header=0, skiprows=4, nrows= 20, use_cols = "A:D")

will now read the excel file, take data from the first sheet (default), skip 4 rows of data, then take the first line (i.e., the fifth line of the sheet) as the header, read the next 20 rows of data into the dataframe (lines 6-25), and only use the columns A:D. Note that use_cols is now the final option, as parse_cols is deprecated.

Is this bringing the entire file into memory before truncating to only 20 rows? — KLDavenport, Oct 05 '18 at 00:04

score 14 · Accepted Answer · answered Mar 02 '16 at 13:27

14

If you know the number of rows in your Excel sheet, you can use the skip_footer parameter to read the first n - skip_footer rows of your file, where n is the total number of rows.

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html

Usage:

data = pd.read_excel(filepath, header=0, parse_cols = "A:D", skip_footer=80)

Assuming your excel sheet has 100 rows, this line would parse the first 20 rows.

answered Mar 02 '16 at 13:27

Erol

6,478
5
41
55

1

Funny, I guess that psychologically, "footer" is associated with something slim (like a physical page footer). Good answer. – Ami Tavory Mar 02 '16 at 13:34
Use `skipfooter` as `skip_footer` is deprecated since version 0.23.0 – yoonghm Jun 07 '19 at 02:53

MaxU - stand with Ukraine · Answer 3 · 2018-11-08T13:20:15.040

I'd like to make (extend) @Erol's answer bit more flexible.

Assuming that we DON'T know the total number of rows in the excel sheet:

xl = pd.ExcelFile(filepath)

# parsing first (index: 0) sheet
total_rows = xl.book.sheet_by_index(0).nrows

skiprows = 4
nrows = 20

# calc number of footer rows
# (-1) - for the header row
skipfooter = total_rows - nrows - skiprows - 1

df = xl.parse(0, skiprows=skiprows, skipfooter=skipfooter, parse_cols="A:D") \
       .dropna(axis=1, how='all')

.dropna(axis=1, how='all') will drop all columns containing only NaN's

Jahangir Khan · Answer 4 · 2023-08-05T02:42:18.843

0

My Dear... Take it easy... Make it simple

data = pd.read_excel(filepath, header=0, skiprows=4, parse_cols = "A:D")

pd = pd[:20]

I hope this is the answer to your question. Enjoy...

edited Aug 05 '23 at 02:42

answered Aug 05 '23 at 02:36

Jahangir Khan

1
2

"My Dear... Take it easy... Make it simple" --> Be more respectful, no jugement, ex: "Here is a simple one line solution:" – Martin Aug 08 '23 at 08:54

In Pandas, whats the equivalent of 'nrows' from read_csv() to be used in read_excel()?

4 Answers4

Linked

Related