2

In the dataframe I have there are so many columns of which I only need a few. For instance

Col_A      Col_B      Col_C      Col_D      Col_E      Col_F
...        ...        ...      ...      ...      ...      ...        

I only need columns Col_A, Col_C and Col_E so currently what I do is df = df[['Col_A', 'Col_C', 'Col_E']] but the issue here is that not always there will columns A, C and E maybe all these wont be present. So I need if Col_A is in df.columns add to the df and so on. Is there any simple method to do this ? rather than so many if? Now if a Column is missing I get a KeyError: "['Col_C'] not in index

4 Answers4

1

You can use loc and isin

df.loc[:, df.columns.isin(['a','b','c'])]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Dishin H Goyani
  • 7,195
  • 3
  • 26
  • 37
1

Use Index.intersection:

df[df.columns.intersection(['Col_A','Col_A','Col_E'], sort=False)]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This is way faster than other answers. Can this be used in groupby like ```df.groupby(by=[df.columns.intersection(['Col_A','Col_A','Col_E'], as_index=False)``` ? –  May 29 '20 at 06:16
  • 1
    @Derik81 - Very similar, like `df.groupby(by=df.columns.intersection(['Col_A','Col_A','Col_E']), as_index=False)` – jezrael May 29 '20 at 06:16
  • I think there is some issue by using this. it gives a ```ValueError: Grouper and axis must be same length```. When I checked the result of ```df.columns.intersection(['Col_A','Col_A','Col_E'])``` it prints as ```Index(['Col_A','Col_A','Col_E'], dtype='object')```. It has no issue when I do ```.groupby(by=['Col_A','Col_A','Col_E']``` but when I input as you gave it throws an error. –  May 29 '20 at 06:35
  • 1
    @Derik81 - I forget, need convert to list `df.groupby(by=df.columns.intersection(['Col_A','Col_A','Col_E']).tolist(), as_index=False)` – jezrael May 29 '20 at 06:36
0

You could use a list comprehension. For example:

test_columns = ['Col_A', 'Col_C', 'Col_E']
df = df[[c for c in test_columns if c in df.columns]]
Michael Delgado
  • 13,789
  • 3
  • 29
  • 54
0

From what I have interpreted, you could create a copy of df, store it in another variable and then 'drop' the columns you don't require,

df_copy = df.copy()
df = df.drop(['Col_A', 'Col_C', 'Col_E'], axis = 1)

# If you want to add other columns to the df
df['Col_B'] = df_copy['Col_B']