2

I have a sorted data frame as below:

            x_test         test_label     x_train             train_label  \
37  [[6.3, 3.3, 4.7, 1.6]]        [1]  [[6.4, 3.2, 4.5, 1.5]]         [1]   
63  [[6.3, 3.3, 4.7, 1.6]]        [1]  [[6.0, 3.4, 4.5, 1.6]]         [1]   
67  [[6.3, 3.3, 4.7, 1.6]]        [1]  [[6.1, 3.0, 4.6, 1.4]]         [1]   
96  [[6.3, 3.3, 4.7, 1.6]]        [1]  [[6.1, 3.0, 4.9, 1.8]]         [2]   
51  [[6.3, 3.3, 4.7, 1.6]]        [1]  [[5.9, 3.2, 4.8, 1.8]]         [1]   

    dist  
37  0.26  
63  0.37  
67  0.42  
96  0.46  
51  0.47  

I'd like to find the mode value at the 'train_label' column (any one) and get it's index. Next I'd like to find the value at the 'test_label' based on that index. how do I do it?

I've tried using df.mode() but didn't succeed.

Bella
  • 937
  • 1
  • 13
  • 25

5 Answers5

2

First, to find the index of the mode value in the train column:

 df.loc[:, 'train_label'] = df['train_label'].apply(lambda x: x[0])
 df.loc[:, 'test_label'] = df['test_label'].apply(lambda x: x[0])

 tr_mode_idx = df['train_label'].mode().index.values

Then to find the value of test_label based on that index:

 df.loc[tr_mode_index, 'test_label']
Ted
  • 1,189
  • 8
  • 15
  • When I'm trying to flatten it (running the first 2 rows), I get the warning: `SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead` – Bella Aug 26 '19 at 09:46
  • @Bella That's a common warning with Pandas (see: https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas). Try changing it to my new edit. – Ted Aug 26 '19 at 09:51
1
df.test_label[df.train_label.isin(df.train_label.mode())]

Result:

37    [1]
63    [1]
67    [1]
51    [1]
Stef
  • 28,728
  • 2
  • 24
  • 52
0

You first need to flatten the data so e.g.:

>>> df["train_label"]=df["train_label"].apply(lambda x: x[0])
>>> df
    dist  test_label  train_label                  x_test                 x_train
37  0.26           1            1  [[6.3, 3.3, 4.7, 1.6]]  [[6.4, 3.2, 4.5, 1.5]]
63  0.37           1            1  [[6.3, 3.3, 4.7, 1.6]]  [[6.0, 3.4, 4.5, 1.6]]
67  0.42           1            1  [[6.3, 3.3, 4.7, 1.6]]  [[6.1, 3.0, 4.6, 1.4]]
96  0.46           1            2  [[6.3, 3.3, 4.7, 1.6]]  [[6.1, 3.0, 4.9, 1.8]]
51  0.47           1            1  [[6.3, 3.3, 4.7, 1.6]]  [[5.9, 3.2, 4.8, 1.8]]

Then run df.mode():

>>> df.mode(numeric_only=True)
   dist  test_label  train_label
0  0.26         1.0          1.0
1  0.37         NaN          NaN
2  0.42         NaN          NaN
3  0.46         NaN          NaN
4  0.47         NaN          NaN

Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34
0

I don't think either of the above answers are the best way to go about it. I suggest that you use boolean indexing to find the subset of the column that corresponds to the value of the mode. In doing so, you will also get their indices. Then, you simply input those index values to any of other columns to find their values at those indices.

As such, it can be simplified into one line of code:

df['test_label'].loc[df['train_label'][df['train_label'] == df['train_label'].mode()[0]].index]
RDoc
  • 346
  • 1
  • 10
0

So i have created a dataframe and selected the column and

df=pd.DataFrame({"A":[14,4,5,4,1], 
                 "B":[5,2,54,3,2], 
                 "C":[20,20,7,3,8], 
                 "train_label":[14,3,6,2,6]}) 
X=df['train_label'].mode()
"""
        A   B   C  train_label
0  14   5  20           14
1   4   2  20            3
2   5  54   7            6
3   4   3   3            2
4   1   2   8            6

"""
for i in X:
   print(df['train_label'].loc[df['train_label']==i].index)

Output

Int64Index([2, 4], dtype='int64')