On Kaggle this page(https://www.kaggle.com/alexisbcook/categorical-variables) there's this section of code
s = (X_train.dtypes == 'object')
object_cols = list(s[s].index)
what is s (what kind of object is it) and how does s[s].index work?
On Kaggle this page(https://www.kaggle.com/alexisbcook/categorical-variables) there's this section of code
s = (X_train.dtypes == 'object')
object_cols = list(s[s].index)
what is s (what kind of object is it) and how does s[s].index work?
Let's take this DataFrame
:
In [2]: X_train = pd.DataFrame([("f", 2)])
In [3]: X_train
Out[3]:
0 1
0 f 2
the first line s = (X_train.dtypes == 'object')
creates a series which indicates whether each column in X_train
is of the object
type (here it is a str
, in particular):
In [4]: s = (X_train.dtypes == 'object')
In [5]: s
Out[5]:
0 True
1 False
dtype: bool
the second line merely selects the column names that have the True
bool in Series s
and returns a list of those columns. This notation uses a trick called boolean array indexing which allows filtering a pandas
object by an boolean iterable, which is s
in our case:
In [7]: object_cols = list(s[s].index)
In [8]: object_cols
Out[8]: [0]