1

Here is the current dataset that I am working with.

df contains Knn, Kss, and Ktt in three separate columns.

What I have been unable to figure out is how to merge the three into a single column and have a column that has a label.

Here is what I currently have but I

df_CohBeh = pd.concat([pd.DataFrame(Knn), 
                       pd.DataFrame(Kss), 
                       pd.DataFrame(Ktt)], 
                      keys=['Knn', 'Kss', 'Ktt'], 
                      ignore_index=True)

Which looks like this:

display(df_CohBeh)
           Knn  Kss        Ktt
0    24.579131  NaN        NaN
1    21.673524  NaN        NaN
2    25.785409  NaN        NaN
3    20.686215  NaN        NaN
4    21.504863  NaN        NaN
..         ...  ...        ...
106        NaN  NaN  27.615440
107        NaN  NaN  27.636029
108        NaN  NaN  26.215347
109        NaN  NaN  27.626850
110        NaN  NaN  25.473380

Which is in essence filtering them, but I would rather have a single column with a string that I can use for plotting on the same seaborn graph "Knn", "Kss", "Ktt". To look at various distributions.

I'm not sure how to create a column that can label the Knn value in the label column.

2 Answers2

2

If df looks like that:

>>> df
          Knn        Kss        Ktt
0   96.054660  72.301166  15.355594
1   36.221933  72.646999  41.670382
2   96.503307  78.597493  71.959442
3   53.867432  17.315678  35.006592
4   43.014227  75.122762  83.666844
5   63.301808  72.514763  64.597765
6    0.201688   1.069586  98.816202
7   48.558265  87.660352   9.140665
8   64.353999  43.534200  15.202242
9   41.260903  24.128533  25.963022
10  63.571747  17.474933  47.093538
11  91.006290  90.834753  37.672980
12  61.960163  87.308155  64.698762
13  87.403750  86.402637  78.946980
14  22.238364  88.394919  81.935868
15  56.356764  80.575804  72.925204
16  30.431063   4.466978  32.257898
17  21.403800  46.752591  59.831690
18  57.330671  14.172341  64.764542
19  54.163311  66.037043   0.822948

Try df.melt

to merge the three into a single column and have a column that has a label.

   variable      value
0       Knn  96.054660
1       Knn  36.221933
2       Knn  96.503307
3       Knn  53.867432
4       Knn  43.014227
5       Knn  63.301808
...
20      Kss  72.301166
21      Kss  72.646999
22      Kss  78.597493
23      Kss  17.315678
24      Kss  75.122762
25      Kss  72.514763
...
40      Ktt  15.355594
41      Ktt  41.670382
42      Ktt  71.959442
43      Ktt  35.006592
44      Ktt  83.666844
45      Ktt  64.597765
...
Corralien
  • 109,409
  • 8
  • 28
  • 52
0

You should use an pandas Series.

knn = pd.DataFram({...})
kss = pd.DataFram({...})
ktt = pd.DataFram({...})
l = knn.values.flatten() + kss.values.flatten() + ktt.values.flatten()
s = pd.Series(l, name="Knn")
Contestosis
  • 369
  • 1
  • 4
  • 19