Pandas: reading Excel file starting from the row below that with a specific value

Question

Say I have the following Excel file:

    A      B     C
0   -      -     -
1   Start  -     -
2   3      2     4
3   7      8     4
4   11     2     17

I want to read the file in a dataframe making sure that I start to read it below the row where the Start value is.

Attention: the Start value is not always located in the same row, so if I were to use:

import pandas as pd
xls = pd.ExcelFile('C:\Users\MyFolder\MyFile.xlsx')
df = xls.parse('Sheet1', skiprows=4, index_col=None)

this would fail as skiprows needs to be fixed. Is there any workaround to make sure that xls.parse finds the string value instead of the row number?

score 12 · Accepted Answer · answered Apr 17 '18 at 12:05

df = pd.read_excel('your/path/filename')

This answer helps in finding the location of 'start' in the df

 for row in range(df.shape[0]): 

       for col in range(df.shape[1]):

           if df.iat[row,col] == 'start':

             row_start = row
             break

after having row_start you can use subframe of pandas

df_required = df.loc[row_start:]

And if you don't need the row containing 'start', just u increment row_start by 1

df_required = df.loc[row_start+1:]

score 8 · Answer 2 · answered Jul 21 '20 at 17:18

8

If you know the specific rows you are interested in, you can skip from the top using skiprow and then parse only the row (or rows) you want using nrows - see pandas.read_excel

df = pd.read_excel('myfile.xlsx', 'Sheet1', skiprows=2, nrows=3,)

answered Jul 21 '20 at 17:18

bfree67

669
7
6

score 3 · Answer 3 · edited Mar 31 '21 at 10:37

3

You could use pd.read_excel('C:\Users\MyFolder\MyFile.xlsx', sheet_name='Sheet1') as it ignores empty excel cells.

Your DataFrame should then look like this:

    A      B     C
0   Start NaN   NaN
1   3      2     4
2   7      8     4
3   11     2     17

Then drop the first row by using

df.drop([0])

to get

    A      B     C
0   3      2     4
1   7      8     4
2   11     2     17

edited Mar 31 '21 at 10:37

Pikamander2

7,332
3
48
69

answered Apr 17 '18 at 11:30

Maxoz99

31
5

This is *almost* what I was looking for, in that my real Excel files have all sorts of information in the first `x` rows, so by doing `pd.read_excel('C:\Users\MyFolder\MyFile.xlsx', sheetname='Sheet1')` I would pick that information up anyway. That's why I explicitly asked for ways in which a specific value could be looked up. Btw, thanks for your efforts, and I am going to upvote this. – FaCoffee Apr 17 '18 at 12:21

Pandas: reading Excel file starting from the row below that with a specific value

3 Answers3

Linked