Pandas How to group columns by their values

Question

I have a data frame:

df = pd.DataFrame([2,2,6,9,7,6,2,9,7,11], columns=['cat1','cat2','cat3','cat4','cat5','cat6','cat7','cat8','cat9','cat10'])

Inside this df, there is only 1 row.

How can I group these columns according to their values and display the clusters of columns in a plot?

Currently, this is my code, but it displays wrong info

grouped_cats = df.groupby(by= lambda value: value, axis = 1)
list(grouped_cats)[0]

wwnde · Answer 1 · 2021-06-28T06:18:09.537

1

What do you mean by the cluster plot. My thinking, best way to visualize this spread is by scatter plot. You can Transpose and rename if needed

df.T.reset_index().plot(kind='scatter', x='index', y=0)

Or even plot

df.T.reset_index().plot(kind='bar', x='index', y=0)

Following your comment and clarificationlets groupby and dict

df.T.reset_index().groupby(0).agg(list).to_dict()

{'index': {2: ['cat1', 'cat2', 'cat7'],
  6: ['cat3', 'cat6'],
  7: ['cat5', 'cat9'],
  9: ['cat4', 'cat8'],
  11: ['cat10']}}

edited Jun 28 '21 at 06:18

answered Jun 28 '21 at 04:35

wwnde

26,119
6
18
32

Hi @wwnde, this is exactly I have already plotted. What I want is groups of names under same count/value number. E.g. `{2:['cat1', 'cat2','cat7'], 6: ['cat3','cat6','cat9',],....}` and then plot nicely. – Franva Jun 28 '21 at 05:59
1

Can we then try `df.T.reset_index().groupby(0).agg(list).to_dict()` – wwnde Jun 28 '21 at 06:15

Prayalankar Ashutosh · Accepted Answer · 2021-06-28T06:46:16.383

Can't figure out your use case but i think file code should help

columns=['cat1','cat2','cat3','cat4','cat5','cat6','cat7','cat8','cat9','cat10']
df = pd.DataFrame([[2,2,6,9,7,6,2,9,7,11]],columns=columns )

grouped_cats = {}
for i,val in enumerate(df.iloc[0]):
    if val in grouped_cats:
        grouped_cats[val].append(columns[i])
    else:
        grouped_cats[val]= [columns[i]]

Output = {2: ['cat1', 'cat2', 'cat7'], 6: ['cat3', 'cat6'], 9: ['cat4', 'cat8'], 7: ['cat5', 'cat9'], 11: ['cat10']}

The easiest way of visualisation that i can think of is

import matplotlib.pyplot as plt

colours = ['green', 'orange', 'red','blue','black']
cluster = {2: ['cat1', 'cat2', 'cat7'],
 6: ['cat3', 'cat6'],
 9: ['cat4', 'cat8'],
 7: ['cat5', 'cat9'],
 11: ['cat10']}

fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_xticks([i for i in range(2,12)] )
for colour, (x, ys) in zip(colours, cluster.items()):
    ax.scatter([x] * len(ys), ys, c=colour, linewidth=0, s=50)


plt.show()

Another way to visualise is for each of the unique values in your data, count the number of labels associated and plot the scatter plot for annotate with class names.

import matplotlib.pyplot as plt

colours = ['green', 'orange', 'red','blue','black']

cluster = {2: ['c1', 'c2', 'c7'],
 6: ['c3', 'c6'],
 9: ['c4', 'c8'],
 7: ['c5', 'c9'],
 11: ['c10']}

z = [len(cluster[ke]) for ke in cluster ]
y = [ke for ke in cluster ]
fig, ax = plt.subplots()
ax.set_xticks([i for i in range(2,12)] )
ax.scatter(y, z, c=colours)
for i,val in enumerate(cluster):
    ax.annotate(','.join(cluster[val]), (y[i], z[i]))

thanks Prayalankar, this data result is what I am looking for. Is it possible to also provide code to plot them nicely in clusters? — Franva, Jun 28 '21 at 06:12
For plot part referred from here: https://stackoverflow.com/questions/45087575/need-to-visualize-a-python-dictionary — Prayalankar Ashutosh, Jun 28 '21 at 06:29
hi @Pray if we use scatter plot, how can we show multiple categories within 1 dot? — Franva, Jun 28 '21 at 06:31

Pandas How to group columns by their values

2 Answers2