0

I have a data frame:

df = pd.DataFrame([2,2,6,9,7,6,2,9,7,11], columns=['cat1','cat2','cat3','cat4','cat5','cat6','cat7','cat8','cat9','cat10'])

Inside this df, there is only 1 row.

How can I group these columns according to their values and display the clusters of columns in a plot?

enter image description here

Currently, this is my code, but it displays wrong info

grouped_cats = df.groupby(by= lambda value: value, axis = 1)
list(grouped_cats)[0]
Franva
  • 6,565
  • 23
  • 79
  • 144

2 Answers2

1

What do you mean by the cluster plot. My thinking, best way to visualize this spread is by scatter plot. You can Transpose and rename if needed

df.T.reset_index().plot(kind='scatter', x='index', y=0)

enter image description here

Or even plot

df.T.reset_index().plot(kind='bar', x='index', y=0)

enter image description here

Following your comment and clarificationlets groupby and dict

df.T.reset_index().groupby(0).agg(list).to_dict()

{'index': {2: ['cat1', 'cat2', 'cat7'],
  6: ['cat3', 'cat6'],
  7: ['cat5', 'cat9'],
  9: ['cat4', 'cat8'],
  11: ['cat10']}}
wwnde
  • 26,119
  • 6
  • 18
  • 32
  • Hi @wwnde, this is exactly I have already plotted. What I want is groups of names under same count/value number. E.g. `{2:['cat1', 'cat2','cat7'], 6: ['cat3','cat6','cat9',],....}` and then plot nicely. – Franva Jun 28 '21 at 05:59
  • 1
    Can we then try `df.T.reset_index().groupby(0).agg(list).to_dict()` – wwnde Jun 28 '21 at 06:15
1

Can't figure out your use case but i think file code should help

columns=['cat1','cat2','cat3','cat4','cat5','cat6','cat7','cat8','cat9','cat10']
df = pd.DataFrame([[2,2,6,9,7,6,2,9,7,11]],columns=columns )

grouped_cats = {}
for i,val in enumerate(df.iloc[0]):
    if val in grouped_cats:
        grouped_cats[val].append(columns[i])
    else:
        grouped_cats[val]= [columns[i]]

Output = {2: ['cat1', 'cat2', 'cat7'], 6: ['cat3', 'cat6'], 9: ['cat4', 'cat8'], 7: ['cat5', 'cat9'], 11: ['cat10']}

The easiest way of visualisation that i can think of is

import matplotlib.pyplot as plt

colours = ['green', 'orange', 'red','blue','black']
cluster = {2: ['cat1', 'cat2', 'cat7'],
 6: ['cat3', 'cat6'],
 9: ['cat4', 'cat8'],
 7: ['cat5', 'cat9'],
 11: ['cat10']}

fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_xticks([i for i in range(2,12)] )
for colour, (x, ys) in zip(colours, cluster.items()):
    ax.scatter([x] * len(ys), ys, c=colour, linewidth=0, s=50)


plt.show()
        
Another way to visualise is for each of the unique values in your data, count the number of labels associated and plot the scatter plot for annotate with class names.
import matplotlib.pyplot as plt

colours = ['green', 'orange', 'red','blue','black']

cluster = {2: ['c1', 'c2', 'c7'],
 6: ['c3', 'c6'],
 9: ['c4', 'c8'],
 7: ['c5', 'c9'],
 11: ['c10']}

z = [len(cluster[ke]) for ke in cluster ]
y = [ke for ke in cluster ]
fig, ax = plt.subplots()
ax.set_xticks([i for i in range(2,12)] )
ax.scatter(y, z, c=colours)
for i,val in enumerate(cluster):
    ax.annotate(','.join(cluster[val]), (y[i], z[i]))

enter image description here