How to select only the cells I need from a XLS file with Pandas

Question

So, i'm trying to select a few lines and columns of an Excel file and turn them into a Pandas dataframe. The problem is: I must select only columns C:G and lines 19 to 245. I tried using df = pd.read_excel("Energy Indicators.xls", skiprows=18, usecols="C:G") without success.

Unfortunately I cannot change the XLS file, so I have to leave it as it is.

I always get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-ff45621b0e89> in <module>()
      4 file_loc = "Energy Indicators.xls"
      5 
----> 6 df = pd.read_excel(file_loc, skiprows=18, usecols="A:D")

/opt/conda/lib/python3.6/site-packages/pandas/io/excel.py in read_excel(io, sheetname, header, skiprows, skip_footer, index_col, names, parse_cols, parse_dates, date_parser, na_values, thousands, convert_float, has_index_names, converters, true_values, false_values, engine, squeeze, **kwds)
    198         skip_footer=skip_footer, converters=converters,
    199         true_values=true_values, false_values=false_values, squeeze=squeeze,
--> 200         **kwds)
    201 
    202 

/opt/conda/lib/python3.6/site-packages/pandas/io/excel.py in _parse_excel(self, sheetname, header, skiprows, names, skip_footer, index_col, has_index_names, parse_cols, parse_dates, date_parser, na_values, thousands, convert_float, true_values, false_values, verbose, squeeze, **kwds)
    502                                     skipfooter=skip_footer,
    503                                     squeeze=squeeze,
--> 504                                     **kwds)
    505 
    506                 output[asheetname] = parser.read()

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in TextParser(*args, **kwds)
   1669     """
   1670     kwds['engine'] = 'python'
-> 1671     return TextFileReader(*args, **kwds)
   1672 
   1673 

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    728             self.options['has_index_names'] = kwds['has_index_names']
    729 
--> 730         self._make_engine(self.engine)
    731 
    732     def close(self):

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
    927             elif engine == 'python-fwf':
    928                 klass = FixedWidthFieldParser
--> 929             self._engine = klass(self.f, **self.options)
    930 
    931     def _failover_to_python(self):

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, **kwds)
   1818         # infer column indices from self.usecols if is is specified.
   1819         self._col_indices = None
-> 1820         self.columns, self.num_original_columns = self._infer_columns()
   1821 
   1822         # Now self.columns has the set of columns that we will process.

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in _infer_columns(self)
   2181                 columns = [names]
   2182             else:
-> 2183                 columns = self._handle_usecols(columns, columns[0])
   2184         else:
   2185             try:

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in _handle_usecols(self, columns, usecols_key)
   2234                 for u in self.usecols:
   2235                     if isinstance(u, string_types):
-> 2236                         col_indices.append(usecols_key.index(u))
   2237                     else:
   2238                         col_indices.append(u)

ValueError: 'A' is not in list

Could someone please help me fix this? Thank you very much

Does this answer your question? [how to read certain columns from Excel using Pandas - Python](https://stackoverflow.com/questions/33655127/how-to-read-certain-columns-from-excel-using-pandas-python) — deadshot, Jul 17 '20 at 21:08

SLuck · Answer 1 · 2020-07-17T21:15:06.840

0

#Would this work?
df = pd.read_excel("Energy Indicators.xls")
df=df[list('CDEFG')]
df=df[19:245]

It may make a difference if your loading a very big file. Else this should be just fine

edited Jul 17 '20 at 21:15

answered Jul 17 '20 at 21:09

SLuck

521
3
14

It's an interesting approach, but it didn't work. The error was: KeyError: "['C' 'D' 'E' 'F' 'G'] not in index" – Marcos Dias Jul 17 '20 at 21:26

score 0 · Answer 2 · answered Jul 17 '20 at 21:20

0

By default read_excel use the first line of the Excel sheet to set the column names. Maybe by adding header=None ?

df = pd.read_excel("Energy Indicators.xls", header=None, skiprows=18, usecols="C:G")

answered Jul 17 '20 at 21:20

Renaud

2,709
2
9
24

Unfortunately it didn't work. The error was: ValueError: ':' is not in list – Marcos Dias Jul 17 '20 at 21:27
And by using column numbers: usecols=range(2,6)? – Renaud Jul 17 '20 at 21:54

How to select only the cells I need from a XLS file with Pandas

2 Answers2