0

output

Hi, I generated the table above using Counter from collections for counting the combinations of 3 variables from a dataframe: Jessica, Mike, and Dog. I got the combination and their counts. Any help to make that table a bit more prettier? I would like to rename the index as grp1, grp2, etc and the column as well with something else than 0. Also, what would be the best plot to use for plotting the different groups? Thanks for your help!!

I used this command to produce the table here: df= np.random.choice(["Mike", "Jessica", "Dog"], size=(20, 3))

Z= pd.DataFrame(df,columns=['a', 'b', 'c'])

import collections

from collections import Counter

LL= Z.apply (Counter, axis= "columns").value_counts()

H= pd.DataFrame(LL)

print(H)

Jess BR
  • 31
  • 2
  • 1
    This problem is probably best solved **before** this step, so you should provide a sample of your data that we can use to show you how to properly aggregate it. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples gives examples of how to make sample data. – ALollz Apr 26 '21 at 18:00
  • I used the following to generate the table above: df= np.random.choice(["Mike", "Jessica", "Dog"], size=(20, 3)) Z= pd.DataFrame(df,columns=['a', 'b', 'c']) and then on a new line import collections from collections import Counter LL= Z.apply (Counter, axis= "columns").value_counts() H= pd.DataFrame(LL) H – Jess BR Apr 26 '21 at 18:12
  • Please edit your question to include this code snippet. It's not portable in its current state. – Henry Ecker Apr 26 '21 at 18:20

2 Answers2

0
  • quite an unusual technique....
  • you can change the dict index to a multi-index
  • then plot() as barh and labels make sense
df= np.random.choice(["Mike", "Jessica", "Dog"], size=(20, 3)) 
Z= pd.DataFrame(df,columns=['a', 'b', 'c']) 
import collections 
from collections import Counter 
LL= Z.apply (Counter, axis= "columns").value_counts() 
H= pd.DataFrame(LL) 
I = pd.Series(H.index).apply(pd.Series)
H = H.set_index(pd.MultiIndex.from_arrays(I.T.values, names=I.columns))
H.plot(kind="barh")

H after setting as multi-index

                  0
Mike Dog Jessica   
2.0  1.0 NaN      5
     NaN 1.0      4
NaN  1.0 2.0      3
1.0  NaN 2.0      3
     1.0 1.0      2
NaN  NaN 3.0      1
     2.0 1.0      1
3.0  NaN NaN      1

enter image description here

Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
0

Instead of using counter, you can apply value_counts directly to each row:

import pandas as pd
from matplotlib import pyplot as plt

# Hard Coded For Reproducibility
df = pd.DataFrame({'a': {0: 'Dog', 1: 'Jessica', 2: 'Mike',
                         3: 'Dog', 4: 'Dog', 5: 'Dog',
                         6: 'Jessica', 7: 'Jessica',
                         8: 'Dog', 9: 'Dog', 10: 'Jessica',
                         11: 'Mike', 12: 'Dog',
                         13: 'Jessica', 14: 'Mike',
                         15: 'Mike',
                         16: 'Mike', 17: 'Dog',
                         18: 'Jessica', 19: 'Mike'},
                   'b': {0: 'Mike', 1: 'Mike', 2: 'Jessica',
                         3: 'Jessica', 4: 'Dog', 5: 'Jessica',
                         6: 'Mike', 7: 'Dog', 8: 'Mike',
                         9: 'Dog', 10: 'Dog', 11: 'Dog',
                         12: 'Dog', 13: 'Jessica',
                         14: 'Jessica', 15: 'Dog',
                         16: 'Dog', 17: 'Dog', 18: 'Jessica', 19: 'Jessica'},
                   'c': {0: 'Mike', 1: 'Dog', 2: 'Jessica',
                         3: 'Dog', 4: 'Dog', 5: 'Dog', 6: 'Dog',
                         7: 'Jessica', 8: 'Mike', 9: 'Dog',
                         10: 'Dog', 11: 'Mike', 12: 'Jessica',
                         13: 'Jessica', 14: 'Jessica',
                         15: 'Jessica', 16: 'Jessica',
                         17: 'Dog', 18: 'Mike', 19: 'Dog'}})

# Apply value_counts across each row
df = df.apply(pd.value_counts, axis=1) \
    .fillna(0)

# Group By All Columns and
# Get Duplicate Count From Group Size
df = pd.DataFrame(df
                  .groupby(df.columns.values.tolist())
                  .size()
                  .sort_values())

# Plot
plt.figure()
df.plot(kind="barh")
plt.tight_layout()
plt.show()

df after groupby, size, and sort:

                  0
Dog Jessica Mike   
0.0 3.0     0.0   1
1.0 2.0     0.0   1
0.0 2.0     1.0   3
1.0 0.0     2.0   3
3.0 0.0     0.0   3
2.0 1.0     0.0   4
1.0 1.0     1.0   5

Plt:

enter image description here

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57