0

On Kaggle this page(https://www.kaggle.com/alexisbcook/categorical-variables) there's this section of code

s = (X_train.dtypes == 'object')
object_cols = list(s[s].index)

what is s (what kind of object is it) and how does s[s].index work?

NatMargo
  • 21
  • 3

1 Answers1

0

Let's take this DataFrame:

In [2]: X_train = pd.DataFrame([("f", 2)])

In [3]: X_train
Out[3]:
   0  1
0  f  2

the first line s = (X_train.dtypes == 'object') creates a series which indicates whether each column in X_train is of the object type (here it is a str, in particular):

In [4]: s = (X_train.dtypes == 'object')

In [5]: s
Out[5]:
0     True
1    False
dtype: bool

the second line merely selects the column names that have the True bool in Series s and returns a list of those columns. This notation uses a trick called boolean array indexing which allows filtering a pandas object by an boolean iterable, which is s in our case:

In [7]: object_cols = list(s[s].index)

In [8]: object_cols
Out[8]: [0]
mabergerx
  • 1,216
  • 7
  • 19