I am learning how to training and test sample by a dataframe. I review a solution post, but I can not understand some detail on code syntax .
In [11]: df = pd.DataFrame(np.random.randn(100, 2))
In [12]: msk = np.random.rand(len(df)) < 0.8
In [13]: train = df[msk]
In [14]: test = df[~msk]
In [15]: len(test)
Out[15]: 21
In [16]: len(train)
Out[16]: 79
Since msk will return an array of boolean. How can the msk be index of df and df[msk] return the actual numerical data? From my understanding, the index of df should be one string or an array of string.