1

So, have a look at the following code:

import numpy as np
import pandas as pd

def answer_one():

    energy = pd.read_excel(io = "Energy Indicators.xls", header = 9, parse_cols = "C:F", skip_footer = 38)
    return energy

answer_one()

It produces the following output:

enter image description here

Now, when I make a little modification to the code, as below, it changes the output completely:

def answer_one():

    energy = pd.read_excel(io = "Energy Indicators.xls", header = 9, parse_cols = "C:F", skip_footer = 38, skiprows = 8)
    return energy

answer_one()

The output that I get is as follows:

enter image description here

Depending upon the argument that I give to the "skiprows" parameter, the output changes itself. I am unable to understand why does changing the value of "skiprows" affect the headers of the dataframe, when we are keeping the argument of the "headers"parameter unchanged? Please find the data file (.xlsx file) here

Any help please? I use Pandas v0.19.2. Also, please don't tag my question as "duplicate". I lose points man. I tried reasonably well to find an existing question, but could not.

CuriousLearner
  • 361
  • 1
  • 6
  • 14
  • What is desired output? do you want index from first column country or default index - `0,1,2` ? And what columns names? Need units or `['Energy Supply', 'Energy Supply per capita', 'Renewable Electricity Production']` ? – jezrael Jan 06 '18 at 08:13
  • The desired output is a DataFrame consisting of respective columns and their headers being "Country", "Energy Supply", "Energy Supply per capita", "% Renewable". I am struggling to skip the header and the footer, but I am unable to. – CuriousLearner Jan 06 '18 at 08:32
  • OK, then check my answer. – jezrael Jan 06 '18 at 08:33

2 Answers2

5

When you skip the first 8 rows, you skip the row that has your header information, and the 9th row becomes your header. Instead of skipping the first 8 rows, try

skiprows=range(1, 9)

In the documentation, skiprows allows an iterable of which rows to skip. There is a related question regarding csv files and the read_csv() method already on StackOverflow.

Hans Musgrave
  • 6,613
  • 1
  • 18
  • 37
1

I believe you need skip all rows by positions defined in list, row 10 is not in list, because data for Andorra. Data (row 1-8) before position defined in header (9) are excluded by default .

Also parse_cols was replaced by usecols, because warning:

FutureWarning: the 'parse_cols' keyword is deprecated, use 'usecols' instead parse_cols = "C:F"

df=pd.read_excel('Energy Indicators.xls',
                  sheet_name='Energy',
                  skiprows=[10,12,13,14,15,16,17],
                  skipfooter=38,
                  header=9,
                  usecols=[2,3,4,5]  #parse_cols = "C:F"
                 )
print (df.head())

          Country Energy Supply Energy Supply per capita  \
0         Andorra             9                      121   
1     Afghanistan           321                       10   
2         Albania           102                       35   
3         Algeria          1959                       51   
4  American Samoa           ...                      ...   

   Renewable Electricity Production  
0                         88.695650  
1                         78.669280  
2                        100.000000  
3                          0.551010  
4                          0.641026  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252