0

I'm trying to read in a data set and dropping the first two columns of the data set, but it seems like it is dropping the wrong column of information. I was looking at this thread, but their suggestion is not giving the expected answer. My data set starts with 6 columns, and I need to remove the first two. Elsewhere in threads it has the option of dropping columns with labels, but I would prefer not to name columns only to drop them if I can do it in one step.

df= pd.read_excel('Data.xls', header=17,footer=246)
df.drop(df.columns[[0,1]], axis=1, inplace=True)

But it is dropping columns 4 and 5 instead of the first two. Is there something with the drop function that I'm just completely missing?

Community
  • 1
  • 1
Stephen Juza
  • 291
  • 1
  • 3
  • 11
  • Print out `df.columns` and make sure it looks like what you were expecting. Maybe the order got changed somewhere? – JohnE Nov 14 '16 at 00:22
  • OK, that seems to be the issue here. When I do that, I get this output: Index(['Petajoules', 'Gigajoules', '%'], dtype='object') Petajoules is the third column when I visually look at the data set. The first two columns are not included in this. How would I drop those two columns if they aren't in df.columns? – Stephen Juza Nov 14 '16 at 01:12
  • 1
    The first two columns might be in the index (multi-index). try `df.reset_index()` -- that converts index columns into regular columns. – JohnE Nov 14 '16 at 01:20
  • Thanks, it was the index. However, now that leads to a different problem. When I try using df.reset_index(), I get an error ("cannot do an non-empty take from an empty axes"). Even if I got solved that error, isn't reset_index() a destructive process? The third column of the multilevel index is what I need to reset it to. However, when I try using df.set_index, it wouldn't let me reset it to the appropriate column because it is currently an index. – Stephen Juza Nov 14 '16 at 02:27

1 Answers1

0

If I understand your question correctly, you have a multilevel index, so drop columns [0, 1] will start counting on non-index columns.

If you know the position of the columns, why not try selecting it directly, such as:

df = df.iloc[:, 3:]
THN
  • 3,351
  • 3
  • 26
  • 40