Select columns of Pandas dataframe if name is in list, or create default and drop the rest

Question

I have a list of column names that I want to get from a DataFrame.

If in the list, we want to slice ONLY the specified columns
If not in the list, we want to generate a placeholder default column of 0's
If there are other column names in the DataFrame, they are irrelevant and should be dropped or otherwise ignored.

Adding a single pandas column is obvious: Pandas: Add column if does not exists, but I'm looking for an efficient and legible way to add multiple columns if they don't exist.

d = {'a': [1, 2], 'b': [3, 4], 'c': [5,6], 'd': [7,8]}
df = pd.DataFrame(d) 
df
   a  b  c  d
0  1  3  5  7
1  2  4  6  8

requested_cols = ['a','b','x','y','z']

I tried something like:

valid_cols = df.columns.values
missing_col_names = [col_name for col_name in requested_cols if col_name not in valid_cols]

df = df.reindex(list(df) + missing_col_names, axis=1).fillna(0)
df = df.loc[:,df.columns.isin(valid_cols)]
df = df.reindex(list(valid_cols))

But this only leaves me with the intersection of feature names.

score 6 · Accepted Answer · answered Aug 07 '19 at 18:44

6

Is this what you need ?

df.reindex(columns = requested_cols, fill_value=0)
Out[134]: 
   a  b  x  y  z
0  1  3  0  0  0
1  2  4  0  0  0

answered Aug 07 '19 at 18:44

BENY

317,841
20
164
234

Wow, this preserves order too! Thank you! – Dave Liu Aug 07 '19 at 18:46

Alexander · Answer 2 · 2019-08-07T18:45:36.403

3

You can use conditional list comprehensions to find the valid and missing columns. Then select the valid columns from the dataframe and use a dictionary comprehension to assign new columns with a default value of zero.

valid_cols = [c for c in requested_cols if c in df]
missing_cols = [c for c in requested_cols if c not in df]

>>> df[valid_cols].assign(**{missing_col: 0 for missing_col in missing_cols})
   a  b  x  y  z
0  1  3  0  0  0
1  2  4  0  0  0

edited Aug 07 '19 at 18:45

answered Aug 07 '19 at 18:44

Alexander

105,104
32
201
196

Does this preserve order of the requested columns? – Dave Liu Aug 07 '19 at 18:45
1

Order is preserved for Python 3.6+ per dictionary ordering. Guaranteed for 3.7+ (implementation detail for 3.6). https://stackoverflow.com/questions/39980323/are-dictionaries-ordered-in-python-3-6 – Alexander Aug 07 '19 at 18:46

Select columns of Pandas dataframe if name is in list, or create default and drop the rest

2 Answers2