1

I am trying to remove ranges of columns in my pandas df. I would prefer to do it in one line but the only method I know is iloc, which doesn't seem to allow multiple references. When I wrote it in separate lines, the columns I don't want remain. Can someone help me with a better way of doing this? Thanks!

import pandas as pd
df = pd.DataFrame({'id': [100,200,300], 'user': ['Bob', 'Jane', 'Alice'], 'income': [50000, 60000, 70000], 'color':['red', 'green', 'blue'], 'state':['GA', 'PA', 'NY'], 'day':['M', 'W', 'Th'], 'color2':['red', 'green', 'blue'], 'state2':['GA', 'PA', 'NY'], 'id2': [100,200,300]})

df.drop(df.iloc[:, 0:2], inplace=True, axis=1)
df.drop(df.iloc[:, 4:5], inplace=True, axis=1)
df.drop(df.iloc[:, 7:9], inplace=True, axis=1)

I'd like the output from the code above to contain columns 'color' and 'color2'

zb3693
  • 13
  • 3
  • 1
    Your question needs a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. – itprorh66 Oct 05 '22 at 18:50
  • 1
    instead of thinking as droping, why not think of subset/select? ie select columns `policies.iloc[:, np.r_[13:62,71,72,79:91]]` – Onyambu Oct 05 '22 at 19:01
  • If you reversed the order of your separate lines, your method would actually work. The issue is the number of columns is reduced after each line, so their index reference location is also going to change... – BeRT2me Oct 05 '22 at 19:35

2 Answers2

1

try the np.r_

answer based on your question, prior to you editing it:

 import pandas as pd
 import numpy as np
    
 idx = np.r_[0:12, 63:70, 73:78, 92:108]
 policies.drop(df.columns[idx], axis = 1, inplace = True)

answer based on your given example:

import pandas as pd
import numpy as np

idx = np.r_[0:2, 4:5, 7:9]
df.drop(df.columns[idx], axis = 1, inplace = True)

PS: the np.r_is exclusive, meaning [0:3], column at position 3 will not be droped.

hope this helps.

Echo
  • 293
  • 2
  • 10
  • 1
    Thank you, I really appreciate the explanations. This was another approach that also worked perfectly. – zb3693 Oct 05 '22 at 19:33
1

You could do:

df = df.drop(df.columns[[*range(0,2), *range(4,5), *range(7,9)]], axis=1)

Output:

   income  color day color2
0   50000    red   M    red
1   60000  green   W  green
2   70000   blue  Th   blue
BeRT2me
  • 12,699
  • 2
  • 13
  • 31