0

I am reading an excel file for which I want to drop some initial rows and columns WHILE reading it. There is a very good option to drop initial rows using skip_rows option. But I am unable to find any option which will help me drop initial columns.

df1=pd.read_excel(r"file_name.xlsx",
                 skiprows=4)

As in my code above, I am able to skip initial 4 rows. Is there any similar option by which I can skip initial 4 columns while reading this excel?

I think its a very basic question and I also tried finding its solution. But unable to do it. Every solution using either names of the columns or total number of columns as parameter.

noobmaster
  • 157
  • 9
  • Check the usecols parameter. You can specify the col numbers as ints or as a string like "A:E". https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html – areobe Nov 26 '21 at 07:13
  • Does this answer your question? [Skip specific set of columns when reading excel frame - pandas](https://stackoverflow.com/questions/49677313/skip-specific-set-of-columns-when-reading-excel-frame-pandas) – Ali_Sh Nov 26 '21 at 07:16
  • @Ali_Sh in this answer they are using number of columns as parameter. Can I do it without using number of columns. Like range(4:)? – noobmaster Nov 26 '21 at 08:19
  • 1
    @noobmaster There is another [recommended link](https://stackoverflow.com/a/24366664/13394817) in my suggested link. Did it solve the problem? – Ali_Sh Nov 26 '21 at 08:27
  • Actually someway or other, all the answers are either using column names or total number of columns. The second link you provided, they have used range(8) which is actually number of columns. I was looking for something like range(: , 4 : ). But it doesn't work. – noobmaster Nov 26 '21 at 11:57
  • @noobmaster Is my answer the desired one or you are looking for some other answers? If so, what else do you have in mind? – Ali_Sh Mar 10 '22 at 22:09

2 Answers2

2

If your excel file looks like:

enter image description here

You can use usecols as below:

>>> pd.read_excel('data.xlsx', skiprows=4,
                  usecols=lambda x: x if not x.startswith('Unnamed') else None)

   ColA  ColB  ColC
0     1     2     3
1     4     5     6
2     7     8     9

Update

Another (ugly?) method: create a counter outside of the function. Each time the function keepcol is called, decrement the counter until it reaches 0. After that, all columns are kept.

skip_cols = 4
def keepcol(name):
    global skip_cols
    if skip_cols == 0:
        return name
    skip_cols -= 1

pd.read_excel('data.xlsx', skiprows=4, usecols=keepcol)
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Well my excel does not have empty columns as above. What if I want to remove colA and colB without using column names, like I am removing rows without using row names. – noobmaster Nov 26 '21 at 08:17
1

You can use range with usecols during reading as:

df1 = pd.read_excel(r"file_name.xlsx", skiprows=4,
                    usecols=range(4, len(pd.read_excel(r"file_name.xlsx").columns)))
Ali_Sh
  • 2,667
  • 3
  • 43
  • 66