1

I am clustering the traders' data from past into cluster using Kmeans. I have 10 traders and I am clustering into 3 clusters. After getting clusters and labels of each index now I want to know the name of the traders each cluster has. For example if Cluster-0 has 3 traders then the output should be something like {'Cluster0': 'Name1','Name2','Name3'} {'Cluster1': 'Name5','Name4','Name6'} and so on and so forth. I was able to get the index of data points which belong to each cluster by

cluster_dict = {i: np.where(data['Labels'] == i) for i in range(n_clusters)} Then I have list of index from new trader data starts like 0-16 trader1, 16-32 trader2 and like that. I also have name of traders in list as ['name1','name2','name3'].

Is there any way to get back the name of trader belongs to each cluster as I stated above. If yes then please help me with this.

Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111
Urvish
  • 643
  • 3
  • 10
  • 19
  • Possible duplicate of [Python sklearn-KMeans how to get the values in the cluster](https://stackoverflow.com/questions/36195457/python-sklearn-kmeans-how-to-get-the-values-in-the-cluster) – Cleb Oct 31 '18 at 10:29
  • @Urvish - Is your problem solved? feel free to raise if you have any doubts. – Mohamed Thasin ah Oct 31 '18 at 10:54
  • @Cleb the line in question for getting index is taken from that question itself.So my question is not duplicate of that but one more step advance to that question from what I can see. – Urvish Oct 31 '18 at 11:01

1 Answers1

1

I think you need something like below,

First get label value and assign that into your dataframe, then apply groupby by based on label and find unique in name (A,B,C) column and store the result.

Following code snippet demonstrates your problem.

from sklearn.cluster import KMeans
import numpy as np
import pandas as pd
X = pd.DataFrame([[1, 2,'A'], [1, 4,'A'], [1, 0,'B'],[4, 2,'C'], [4, 4,'C'], [4, 0,'B']])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X[[0,1]])
result= kmeans.labels_
X['label']=result
print X.groupby('label')[2].unique()

Output:

label
0    [A, B]
1    [C, B]

For Dict representation ,

print X.groupby('label')[2].unique().to_dict()

Output:

{0: array(['A', 'B'], dtype=object), 1: array(['C', 'B'], dtype=object)}

To get the result in same dataframe use below,

X['cluster_name']= X.groupby('label')[2].transform('unique')

Output:

   0  1  2  label cluster_name
0  1  2  A      0       [A, B]
1  1  4  A      0       [A, B]
2  1  0  B      0       [A, B]
3  4  2  C      1       [C, B]
4  4  4  C      1       [C, B]
5  4  0  B      1       [C, B]
Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111
  • Your answer is really good. the only problem that I can think of is getting a name as a separate column in df as for now I only have a list of names and I do not know how I can add them in df. One thing is to iterate over the index of df and then see if the index is less than a size of each trader df then add that name else do not add name. What are your suggestions? – Urvish Oct 31 '18 at 10:59
  • The only problem is that I do not have names as A,B and C in df from the beginning. Can you help me how to get them as a separate column in df. Your answer is correct if I can get that column. I have list of names like `[A,B,C]` but do not know how to expand that as column. – Urvish Oct 31 '18 at 11:07
  • @Urvish - For this you have to show your df. without seeing your df it's very hard to give a suggestion. – Mohamed Thasin ah Oct 31 '18 at 11:09
  • @Urvish - still do you have any issue related to this, If so feel free to raise any question related to this, or else accept my answer. It would be easy to identify in future to other. – Mohamed Thasin ah Nov 13 '18 at 06:47
  • I got this working as per the code you suggested. Thank you for help. – Urvish Nov 23 '18 at 21:08
  • @Urvish- glad to help you, then you can upvote and accept the answer – Mohamed Thasin ah Nov 24 '18 at 02:00