Reading an Excel file in python using pandas

Question

I am trying to read an excel file this way :

newFile = pd.ExcelFile(PATH\FileName.xlsx)
ParsedData = pd.io.parsers.ExcelFile.parse(newFile)

which throws an error that says two arguments expected, I don't know what the second argument is and also what I am trying to achieve here is to convert an Excel file to a DataFrame, Am I doing it the right way? or is there any other way to do this using pandas?

score 257 · Accepted Answer · answered Jun 12 '13 at 10:52

257

Close: first you call ExcelFile, but then you call the .parse method and pass it the sheet name.

>>> xl = pd.ExcelFile("dummydata.xlsx")
>>> xl.sheet_names
[u'Sheet1', u'Sheet2', u'Sheet3']
>>> df = xl.parse("Sheet1")
>>> df.head()
                  Tid  dummy1    dummy2    dummy3    dummy4    dummy5  \
0 2006-09-01 00:00:00       0  5.894611  0.605211  3.842871  8.265307   
1 2006-09-01 01:00:00       0  5.712107  0.605211  3.416617  8.301360   
2 2006-09-01 02:00:00       0  5.105300  0.605211  3.090865  8.335395   
3 2006-09-01 03:00:00       0  4.098209  0.605211  3.198452  8.170187   
4 2006-09-01 04:00:00       0  3.338196  0.605211  2.970015  7.765058   

     dummy6  dummy7    dummy8    dummy9  
0  0.623354       0  2.579108  2.681728  
1  0.554211       0  7.210000  3.028614  
2  0.567841       0  6.940000  3.644147  
3  0.581470       0  6.630000  4.016155  
4  0.595100       0  6.350000  3.974442

What you're doing is calling the method which lives on the class itself, rather than the instance, which is okay (although not very idiomatic), but if you're doing that you would also need to pass the sheet name:

>>> parsed = pd.io.parsers.ExcelFile.parse(xl, "Sheet1")
>>> parsed.columns
Index([u'Tid', u'dummy1', u'dummy2', u'dummy3', u'dummy4', u'dummy5', u'dummy6', u'dummy7', u'dummy8', u'dummy9'], dtype=object)

answered Jun 12 '13 at 10:52

DSM

342,061
65
592
494

6

when I use "df = xl.parse("Sheet1")" it automatically takes the first cell's value of each column as the dataframe's column names , how do I specify my own column names? – Rakesh Adhikesavan Jun 27 '13 at 05:57
2

In pandas 15.0.2, `parsed = pd.io.parsers.ExcelFile.parse(xl, "Sheet1")` does not work and throws error `module object has no attribute ExcelFile` . `parsed = pd.io.excel.ExcelFile.parse(xl, "Sheet1")` works for me – Neil May 12 '16 at 14:11
Nice. This is almost as neat as how the same accessing of a tab is achieved using `openpyxl`. Is pandas calling openpyxl under the hood? – Pyderman May 13 '16 at 02:30
2

How do you prevent it from turning the first row into headers? I've tried using the parameter `headers=None`but while it didn't break the code, it didn't work either. – Elliptica Jul 23 '16 at 01:03
8

I discovered that **pip install xlrd** is needed for this to work. The xlrd package doesn't come with pandas, so if you didn't install it for some other purpose, you'll get a "ImportError: No module named xlrd" exception. True as of pandas 0.19.0 on Mac, at any rate. – user5920660 Jan 10 '18 at 02:29
@user5920660 True as of Ubuntu too, and it must be `xlrd >= 0.9.0`! – Gathide Oct 05 '18 at 12:58

score 109 · Answer 2 · edited Feb 17 '23 at 11:47

109

This is much simple and easy way.

import pandas
df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname='Sheet 1')
# or using sheet index starting 0
df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname=2)

Check out documentation full details.

FutureWarning: The sheetname keyword is deprecated for newer Pandas versions, use sheet_name instead.

edited Feb 17 '23 at 11:47

Guillaume G

313
1
2
15

answered Apr 13 '16 at 06:51

Murali Mopuru

6,086
5
33
51

score 22 · Answer 3 · edited Jan 28 '21 at 15:14

Thought i should add here, that if you want to access rows or columns to loop through them, you do this:

import pandas as pd

# open the file
xlsx = pd.ExcelFile("PATH\FileName.xlsx")

# get the first sheet as an object
sheet1 = xlsx.parse(0)
    
# get the first column as a list you can loop through
# where the is 0 in the code below change to the row or column number you want    
column = sheet1.icol(0).real

# get the first row as a list you can loop through
row = sheet1.irow(0).real

Edit:

The methods icol(i) and irow(i) are deprecated now. You can use sheet1.iloc[:,i] to get the i-th col and sheet1.iloc[i,:] to get the i-th row.

score 18 · Answer 4 · edited Aug 06 '21 at 07:20

18

I think this should satisfy your need:

import pandas as pd

# Read the excel sheet to pandas dataframe
df = pd.read_excel("PATH\FileName.xlsx", sheet_name=0) #corrected argument name

edited Aug 06 '21 at 07:20

TangoAlee

1,260
2
13
34

answered Nov 30 '16 at 14:10

Ajay Sant

665
1
11
21

6

FYI the correct argument name is `sheet_name` not `sheetname`. – mynameisvinn Dec 15 '20 at 20:27

score 2 · Answer 5 · answered Aug 22 '19 at 17:43

Here is an updated method with syntax that is more common in python code. It also prevents you from opening the same file multiple times.

import pandas as pd

sheet1, sheet2 = None, None
with pd.ExcelFile("PATH\FileName.xlsx") as reader:
    sheet1 = pd.read_excel(reader, sheet_name='Sheet1')
    sheet2 = pd.read_excel(reader, sheet_name='Sheet2')

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

score 1 · Answer 6 · answered Mar 30 '19 at 22:48

You just need to feed the path to your file to pd.read_excel

import pandas as pd

file_path = "./my_excel.xlsx"
data_frame = pd.read_excel(file_path)

Checkout the documentation to explore parameters like skiprows to ignore rows when loading the excel

score 0 · Answer 7 · edited Jul 04 '19 at 06:46

0

import pandas as pd

data = pd.read_excel (r'**YourPath**.xlsx')

print (data)

edited Jul 04 '19 at 06:46

4b0

21,981
30
95
142

answered Jul 04 '19 at 06:44

Suthura Sudharaka

643
10
25

score 0 · Answer 8 · answered Sep 25 '20 at 09:11

Loading an excel file without explicitly naming a sheet but instead giving the number of the sheet order (often one will simply load the first sheet) goes like:

import pandas as pd
myexcel = pd.ExcelFile("C:/filename.xlsx")
myexcel = myexcel.parse(myexcel.sheet_names[0])

Since .sheet_names returns a list of sheet names, it is easy to load one or more sheets by simply calling the list element(s).

score 0 · Answer 9 · answered Mar 09 '21 at 15:36

All of these works for me

In [1]: import pandas as pd

In [2]: df = pd.read_excel('FileName.xlsx') # If there is only one sheet in the excel file

In [3]: df = pd.read_excel('FileName.xlsx', sheet_name=0)

In [4]: In [20]: df = pd.read_excel('FileName.xlsx', sheet_name='Sheet 1')

score 0 · Answer 10 · answered Oct 25 '22 at 11:42

0

#load pandas library

import pandas as pd

#set path where the file is

path = "./myfile.xlsx"

#load the file into dataframe df

df = pd.read_excel(path)

#check the first 5 rows

df.head(5)

answered Oct 25 '22 at 11:42

Kon Li

9
2

Thanks for adding this, but this just duplicates a subset of existing answers. Also, next time please format code by enclosing it in triple backticks. – craigb Oct 30 '22 at 17:58

Reading an Excel file in python using pandas

10 Answers10

Linked

Related