I would like to create views or dataframes from an existing dataframe based on column selections.
For example, I would like to create a dataframe df2
from a dataframe df1
that holds all columns from it except two of them. I tried doing the following, but it didn't work:
import numpy as np
import pandas as pd
# Create a dataframe with columns A,B,C and D
df = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))
# Try to create a second dataframe df2 from df with all columns except 'B' and D
my_cols = set(df.columns)
my_cols.remove('B').remove('D')
# This returns an error ("unhashable type: set")
df2 = df[my_cols]
What am I doing wrong? Perhaps more generally, what mechanisms does pandas have to support the picking and exclusions of arbitrary sets of columns from a dataframe?