3

I have a dataframe that is like this:

| A | B | C  | D |  
|---|---|----|---|  
| 1 | 3 | 10 | 4 |  
| 2 | 3 | 1  | 5 |  
| 1 | 7 | 9  | 3 |  

Where A B C D are categories, and the values are in the range [1, 10] (some values might not appear in a single column)

I would like to have a dataframe that for every category shows the count of those values. Something like this:

|    | A | B  | C | D |
|----|---|----|---|---|  
| 1  | 2 | 0  | 1 | 0 |
| 2  | 1 | 0  | 0 | 0 |
| 3  | 0 | 2  | 0 | 1 |
| 4  | 0 | 0  | 0 | 1 |
| 5  | 0 | 0  | 0 | 1 |
| 6  | 0 | 0  | 0 | 0 |
| 7  | 0 | 1  | 0 | 0 |
| 8  | 0 | 0  | 0 | 0 |
| 9  | 0 | 0  | 1 | 0 |
| 10 | 0 | 0  | 1 | 0 | 

I tried using groupby and pivot_table but I can't seem to understand what parameters to give.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Mehdi Saffar
  • 691
  • 1
  • 8
  • 24
  • related : [How to get value counts for multiple columns at once in Pandas DataFrame?](https://stackoverflow.com/questions/32589829/how-to-get-value-counts-for-multiple-columns-at-once-in-pandas-dataframe) – anky Sep 05 '20 at 19:19

3 Answers3

8

Option 1

import seaborn as sns
import pandas as pd

# dataframe setup
data = {'A': [1, 2, 1], 'B': [3, 3, 7], 'C': [10, 1, 9], 'D': [4, 5, 3]}
df = pd.DataFrame(data)

# create a dataframe of the counts for each column
counts = df.apply(pd.value_counts)

# display(count)
      A    B    C    D
1   2.0  NaN  1.0  NaN
2   1.0  NaN  NaN  NaN
3   NaN  2.0  NaN  1.0
4   NaN  NaN  NaN  1.0
5   NaN  NaN  NaN  1.0
7   NaN  1.0  NaN  NaN
9   NaN  NaN  1.0  NaN
10  NaN  NaN  1.0  NaN

# plot
sns.heatmap(counts)

enter image description here

Option 2

  • There are a number of style options available with heatmap, and changing the color with cmap can improve the visualization.
# counts
counts = df.apply(pd.value_counts).fillna(0)

# plot
sns.heatmap(counts, cmap="GnBu", annot=True)

enter image description here

default color

sns.heatmap(counts, annot=True)

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
2

this is my first time posting answers, hope it is hopeful

import seaborn as sns
import pandas as pd
import numpy as np

data = {'A': [1, 2, 1], 'B': [3, 3, 7], 'C': [10, 1, 9], 'D': [4, 5, 3]}
df = pd.DataFrame(data)

df1 = pd.DataFrame(data = None , index = np.arange(11),columns = df.columns) 

for value in df.columns:
    df1[value]= df[value].value_counts()    
df1.fillna(0)
1
# necessary imports
import pandas as pd
import numpy as np

Start with the dataframe:

df = pd.DataFrame({'A': [1, 2, 1],
                   'B': [3, 3, 7],
                   'C': [10,1, 9],
                   'D': [4, 5, 3]},
                   index=[0, 1, 2])

Then you can do:

d = pd.DataFrame(0, index=np.arange(10), columns=['A','B','C','D'])

or, to be more general:

d = pd.DataFrame(0, index=np.arange(10), columns=df.columns)

d will have the structure of the dataframe you want as a result, but with all values 0.

Fill the dataframe:

for col in df.columns:
    d[col]=df[col].value_counts()

The 0s got replaced by NaNs. Make them 0 again:

d.replace(np.nan, 0, inplace=True)

This will give you:

+----+-----+-----+-----+-----+
|    |   A |   B |   C |   D |
|----+-----+-----+-----+-----|
|  0 |   0 |   0 |   0 |   0 |
|  1 |   2 |   0 |   1 |   0 |
|  2 |   1 |   0 |   0 |   0 |
|  3 |   0 |   2 |   0 |   1 |
|  4 |   0 |   0 |   0 |   1 |
|  5 |   0 |   0 |   0 |   1 |
|  6 |   0 |   0 |   0 |   0 |
|  7 |   0 |   1 |   0 |   0 |
|  8 |   0 |   0 |   0 |   0 |
|  9 |   0 |   0 |   1 |   0 |
+----+-----+-----+-----+-----+
zabop
  • 6,750
  • 3
  • 39
  • 84
  • I cannot select two answers as "accepted" in stackoverflow unfortunately. I also like your answer because it is more flexible. I marked his/her answer as accepted because I found it to be more complete. But yours is very good too! Thank you. – Mehdi Saffar Sep 05 '20 at 19:40
  • 2
    No worries. I agree with your choice, the answer you marked accepted is the one which should be marked as accepted ;) – zabop Sep 05 '20 at 20:01