pandas - Why am I unable to skip rows using the "skiprows" parameter of Pandas

Question

So, have a look at the following code:

import numpy as np
import pandas as pd

def answer_one():

    energy = pd.read_excel(io = "Energy Indicators.xls", header = 9, parse_cols = "C:F", skip_footer = 38)
    return energy

answer_one()

It produces the following output:

Now, when I make a little modification to the code, as below, it changes the output completely:

def answer_one():

    energy = pd.read_excel(io = "Energy Indicators.xls", header = 9, parse_cols = "C:F", skip_footer = 38, skiprows = 8)
    return energy

answer_one()

The output that I get is as follows:

Depending upon the argument that I give to the "skiprows" parameter, the output changes itself. I am unable to understand why does changing the value of "skiprows" affect the headers of the dataframe, when we are keeping the argument of the "headers"parameter unchanged? Please find the data file (.xlsx file) here

Any help please? I use Pandas v0.19.2. Also, please don't tag my question as "duplicate". I lose points man. I tried reasonably well to find an existing question, but could not.

What is desired output? do you want index from first column country or default index - `0,1,2` ? And what columns names? Need units or `['Energy Supply', 'Energy Supply per capita', 'Renewable Electricity Production']` ? — jezrael, Jan 06 '18 at 08:13
The desired output is a DataFrame consisting of respective columns and their headers being "Country", "Energy Supply", "Energy Supply per capita", "% Renewable". I am struggling to skip the header and the footer, but I am unable to. — CuriousLearner, Jan 06 '18 at 08:32

score 5 · Answer 1 · answered Jan 06 '18 at 08:18

5

When you skip the first 8 rows, you skip the row that has your header information, and the 9th row becomes your header. Instead of skipping the first 8 rows, try

skiprows=range(1, 9)

In the documentation, skiprows allows an iterable of which rows to skip. There is a related question regarding csv files and the read_csv() method already on StackOverflow.

answered Jan 06 '18 at 08:18

Hans Musgrave

6,613
1
18
37

I've used skiprows=range(1, nrows_to_skip-1) – kato2 Jun 24 '19 at 20:15
You almost certainly want `range(1, nrows_to_skip + 1)` instead. – Hans Musgrave Jul 08 '19 at 16:39

jezrael · Accepted Answer · 2018-01-06T08:41:40.850

1

I believe you need skip all rows by positions defined in list, row 10 is not in list, because data for Andorra. Data (row 1-8) before position defined in header (9) are excluded by default .

Also parse_cols was replaced by usecols, because warning:

FutureWarning: the 'parse_cols' keyword is deprecated, use 'usecols' instead parse_cols = "C:F"

df=pd.read_excel('Energy Indicators.xls',
                  sheet_name='Energy',
                  skiprows=[10,12,13,14,15,16,17],
                  skipfooter=38,
                  header=9,
                  usecols=[2,3,4,5]  #parse_cols = "C:F"
                 )
print (df.head())

          Country Energy Supply Energy Supply per capita  \
0         Andorra             9                      121   
1     Afghanistan           321                       10   
2         Albania           102                       35   
3         Algeria          1959                       51   
4  American Samoa           ...                      ...   

   Renewable Electricity Production  
0                         88.695650  
1                         78.669280  
2                        100.000000  
3                          0.551010  
4                          0.641026

edited Jan 06 '18 at 08:41

answered Jan 06 '18 at 08:26

jezrael

822,522
95
1,334
1,252

Well, since I am using Pandas v0.19.2, "parse_cols" is the one to use. "usecols" won't work at all. – CuriousLearner Jan 06 '18 at 08:36
Yes, solution is for pandas `0.22.0`. Is possible upgrade? – jezrael Jan 06 '18 at 08:37
Nope. Tried your code, but doesn't work. I get countries as the index, and only two columns: "Energy Supply per capita", "Renewable Electricity Production" – CuriousLearner Jan 06 '18 at 08:38
Unfortunately, no. Since I am running my program on a website, the server has everything installed. I am forced to work with 0.19.2. – CuriousLearner Jan 06 '18 at 08:40
do you omit `index_col` parameter? – jezrael Jan 06 '18 at 08:40
Can you please open a chat with me? – CuriousLearner Jan 06 '18 at 08:42
OK, but first is possible test `parse_cols = "C:F"` instead `parse_cols` ? – jezrael Jan 06 '18 at 08:44
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/162647/discussion-between-archan-joshi-and-jezrael). – CuriousLearner Jan 06 '18 at 08:45

pandas - Why am I unable to skip rows using the "skiprows" parameter of Pandas

2 Answers2