Read the last column from an Excel file with pandas

Question

It's like how to read certain columns from Excel using Pandas - Python but a little bit more complicated.

Say I have an Excel file called "foo.xlsx" and it grows over time - a new column will be appended on the right every month. However, when I read it, I need only the first two and the last columns. I expected usecols parameter can solve this problem so I went df = pd.read_excel("foo.xlsx", usecols=[0, 1, -1]) but it gives me only the first two columns.

My workaround turns out to be:

df = pd.read_excel("foo.xlsx")
df = df[df.columns[[0, 1, -1]]]

But it needs reading the whole file every time. Is there any way that I can get my desired data frame while reading the file? Thanks.

@Nihal Could you please elaborate your points? `pandas` does support reading the file with specified columns, but I am asking how to always get the last one. — ytu, Feb 21 '19 at 07:33
I suppose specifying the `usecols` argument won't save a lot of time because `read_excel` always reads the whole sheet. `usecols` just skips the rest after heaving read everything and will only accelerate parsing . — JoergVanAken, Feb 21 '19 at 08:04

score 1 · Answer 1 · answered Feb 21 '19 at 07:35

1

One idea is get column count and pass to usecols:

from openpyxl import load_workbook

path = "file.xlsx"

wb = load_workbook(path)
sheet = wb.worksheets[0]
column_count = sheet.max_column
print (column_count)

Or read only first row of file:

column_count  = len(pd.read_excel(path, nrows=0).columns)

df = pd.read_excel(path, usecols=[0, 1, column_count-1])
print (df)

answered Feb 21 '19 at 07:35

jezrael

822,522
95
1,334
1,252

1

This is suboptimal because the excel file will be read twice which is very time consuming for big files. – JoergVanAken Feb 21 '19 at 08:28

score 1 · Accepted Answer · answered Feb 21 '19 at 08:28

1

If you really want to do this (see my comment above) you could to this:

xl = pd.ExcelFile(file)
ncols = xl.book.sheets()[0].ncols
df = xl.parse(0, usecols=[0, 1, ncols-1])

This solution won't read the excel file twice.

answered Feb 21 '19 at 08:28

JoergVanAken

1,286
9
10

Just out of curiosity, doesn't `xl.book.sheets()[0].ncols` need to read the whole Excel file to get the answer? I checked [ExcelFile Vs. read_excel in pandas](https://stackoverflow.com/a/49991054/6666231) and I wonder how this solution can be better. – ytu Feb 21 '19 at 09:55
1

Yes, it has to read the whole file. But the following parse command doesn't have to id again. So overall not much has been won, but nothing is done twice (in contrast to jezreal's answer) – JoergVanAken Feb 21 '19 at 09:59

Arjun · Answer 3 · 2020-10-01T18:08:55.183

-1

You can use df.head() and df.tail() to read the first 2 and last line. For example:

df = pd.read_excel("foo.xlsx", sheet_name='ABC')
#print the first 2 column
print(df.head(2))
#print the last column
print(df.tail(1))

EDIT: Oops the above code reads rows and not columns. Yes, you have to read the file everytime. I don't think there's an option to read partial file.

For reading column maybe you can do something like this

df['Column Name'][index]

edited Oct 01 '20 at 18:08

answered Sep 29 '20 at 00:28

Arjun

21
5

Your answer provides rows, not columns. Also, it cannot prevent reading the whole file while the file is getting larger over time. – ytu Sep 30 '20 at 08:02

Read the last column from an Excel file with pandas

3 Answers3

Linked