I have a list of column names that I want to get from a DataFrame.
- If in the list, we want to slice ONLY the specified columns
- If not in the list, we want to generate a placeholder default column of 0's
- If there are other column names in the DataFrame, they are irrelevant and should be dropped or otherwise ignored.
Adding a single pandas column is obvious: Pandas: Add column if does not exists, but I'm looking for an efficient and legible way to add multiple columns if they don't exist.
d = {'a': [1, 2], 'b': [3, 4], 'c': [5,6], 'd': [7,8]}
df = pd.DataFrame(d)
df
a b c d
0 1 3 5 7
1 2 4 6 8
requested_cols = ['a','b','x','y','z']
I tried something like:
valid_cols = df.columns.values
missing_col_names = [col_name for col_name in requested_cols if col_name not in valid_cols]
df = df.reindex(list(df) + missing_col_names, axis=1).fillna(0)
df = df.loc[:,df.columns.isin(valid_cols)]
df = df.reindex(list(valid_cols))
But this only leaves me with the intersection of feature names.