How to deal with variable number of columns in dataframe

Question

In the dataframe I have there are so many columns of which I only need a few. For instance

Col_A      Col_B      Col_C      Col_D      Col_E      Col_F
...        ...        ...      ...      ...      ...      ...

I only need columns Col_A, Col_C and Col_E so currently what I do is df = df[['Col_A', 'Col_C', 'Col_E']] but the issue here is that not always there will columns A, C and E maybe all these wont be present. So I need if Col_A is in df.columns add to the df and so on. Is there any simple method to do this ? rather than so many if? Now if a Column is missing I get a KeyError: "['Col_C'] not in index

score 1 · Answer 1 · edited May 29 '20 at 06:07

1

You can use loc and isin

df.loc[:, df.columns.isin(['a','b','c'])]

edited May 29 '20 at 06:07

jezrael

822,522
95
1,334
1,252

answered May 29 '20 at 06:04

Dishin H Goyani

7,195
3
26
37

score 1 · Accepted Answer · answered May 29 '20 at 06:06

1

Use Index.intersection:

df[df.columns.intersection(['Col_A','Col_A','Col_E'], sort=False)]

answered May 29 '20 at 06:06

jezrael

822,522
95
1,334
1,252

This is way faster than other answers. Can this be used in groupby like ```df.groupby(by=[df.columns.intersection(['Col_A','Col_A','Col_E'], as_index=False)``` ? – May 29 '20 at 06:16
1

@Derik81 - Very similar, like `df.groupby(by=df.columns.intersection(['Col_A','Col_A','Col_E']), as_index=False)` – jezrael May 29 '20 at 06:16
I think there is some issue by using this. it gives a ```ValueError: Grouper and axis must be same length```. When I checked the result of ```df.columns.intersection(['Col_A','Col_A','Col_E'])``` it prints as ```Index(['Col_A','Col_A','Col_E'], dtype='object')```. It has no issue when I do ```.groupby(by=['Col_A','Col_A','Col_E']``` but when I input as you gave it throws an error. – May 29 '20 at 06:35
1

@Derik81 - I forget, need convert to list `df.groupby(by=df.columns.intersection(['Col_A','Col_A','Col_E']).tolist(), as_index=False)` – jezrael May 29 '20 at 06:36

score 0 · Answer 3 · answered May 29 '20 at 05:48

0

You could use a list comprehension. For example:

test_columns = ['Col_A', 'Col_C', 'Col_E']
df = df[[c for c in test_columns if c in df.columns]]

answered May 29 '20 at 05:48

Michael Delgado

13,789
3
29
54

score 0 · Answer 4 · answered May 29 '20 at 05:51

From what I have interpreted, you could create a copy of df, store it in another variable and then 'drop' the columns you don't require,

df_copy = df.copy()
df = df.drop(['Col_A', 'Col_C', 'Col_E'], axis = 1)

# If you want to add other columns to the df
df['Col_B'] = df_copy['Col_B']

How to deal with variable number of columns in dataframe

4 Answers4