3

I have a list of column names that I want to get from a DataFrame.

  1. If in the list, we want to slice ONLY the specified columns
  2. If not in the list, we want to generate a placeholder default column of 0's
  3. If there are other column names in the DataFrame, they are irrelevant and should be dropped or otherwise ignored.

Adding a single pandas column is obvious: Pandas: Add column if does not exists, but I'm looking for an efficient and legible way to add multiple columns if they don't exist.

d = {'a': [1, 2], 'b': [3, 4], 'c': [5,6], 'd': [7,8]}
df = pd.DataFrame(d) 
df
   a  b  c  d
0  1  3  5  7
1  2  4  6  8

requested_cols = ['a','b','x','y','z']

I tried something like:

valid_cols = df.columns.values
missing_col_names = [col_name for col_name in requested_cols if col_name not in valid_cols]

df = df.reindex(list(df) + missing_col_names, axis=1).fillna(0)
df = df.loc[:,df.columns.isin(valid_cols)]
df = df.reindex(list(valid_cols))

But this only leaves me with the intersection of feature names.

Dave Liu
  • 906
  • 1
  • 11
  • 31

2 Answers2

6

Is this what you need ?

df.reindex(columns = requested_cols, fill_value=0)
Out[134]: 
   a  b  x  y  z
0  1  3  0  0  0
1  2  4  0  0  0
BENY
  • 317,841
  • 20
  • 164
  • 234
3

You can use conditional list comprehensions to find the valid and missing columns. Then select the valid columns from the dataframe and use a dictionary comprehension to assign new columns with a default value of zero.

valid_cols = [c for c in requested_cols if c in df]
missing_cols = [c for c in requested_cols if c not in df]

>>> df[valid_cols].assign(**{missing_col: 0 for missing_col in missing_cols})
   a  b  x  y  z
0  1  3  0  0  0
1  2  4  0  0  0
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • Does this preserve order of the requested columns? – Dave Liu Aug 07 '19 at 18:45
  • 1
    Order is preserved for Python 3.6+ per dictionary ordering. Guaranteed for 3.7+ (implementation detail for 3.6). https://stackoverflow.com/questions/39980323/are-dictionaries-ordered-in-python-3-6 – Alexander Aug 07 '19 at 18:46