So far I am able to get the list of all column names present in the dataframe or to get a specific column names based on its datatype, starting letters, etc...
Now my requirement is to get the whole list of column names or a sublist and to exclude one column from it (i.e Target variable / Label Column. This is a part of Machine Learning. So I am using the terms that are used in machine learning)
Please note I am not speaking about the data present in those columns. I am just taking the column names and want to exclude a particular column by its name
Please see below example for better understanding :
# Get all the column names from a Dataframe
df.columns
Index(['transactionID', 'accountID', 'transactionAmountUSD',
'transactionAmount', 'transactionCurrencyCode',
'accountAge', 'validationid', 'LABEL'],
dtype='object')
# Get only the Numeric Variables (Columns with numeric values in it)
df._get_numeric_data().columns
Index(['transactionAmountUSD', 'transactionAmount', 'accountAge', 'LABEL'],
dtype='object')
Now inorder to get remaining column names I am subtracting both the above commands
string_cols = list(set(list(df.columns))-set(df._get_numeric_data().columns))
Ok everything goes well until I hit this.
I have found out that Label column though it has numeric values it should not be present in the list of numeric variables. It should be excluded.
(i.e) I want to exclude a particular column name (not using its index in the list but using its name explicitly)
I tried similar statements like the following ones but in vain. Any inputs on this will be helpful
set(df._get_numeric_data().columns-set(df.LABEL)
set(df._get_numeric_data().columns-set(df.LABEL.column)
set(df._get_numeric_data().columns-set(df['LABEL'])
I am sure I am missing a very basic thing but not able to figure it out.