How to get value counts for multiple columns at once in Pandas DataFrame?

Question

Given a Pandas DataFrame that has multiple columns with categorical values (0 or 1), is it possible to conveniently get the value_counts for every column at the same time?

For example, suppose I generate a DataFrame as follows:

import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd'))

I can get a DataFrame like this:

   a  b  c  d
0  0  1  1  0
1  1  1  1  1
2  1  1  1  0
3  0  1  0  0
4  0  0  0  1
5  0  1  1  0
6  0  1  1  1
7  1  0  1  0
8  1  0  1  1
9  0  1  1  0

How do I conveniently get the value counts for every column and obtain the following conveniently?

   a  b  c  d
0  6  3  2  6
1  4  7  8  4

My current solution is:

pieces = []
for col in df.columns:
    tmp_series = df[col].value_counts()
    tmp_series.name = col
    pieces.append(tmp_series)
df_value_counts = pd.concat(pieces, axis=1)

But there must be a simpler way, like stacking, pivoting, or groupby?

EdChum · Accepted Answer · 2018-06-22T08:15:20.007

166

Just call apply and pass pd.Series.value_counts:

In [212]:
df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd'))
df.apply(pd.Series.value_counts)
Out[212]:
   a  b  c  d
0  4  6  4  3
1  6  4  6  7

edited Jun 22 '18 at 08:15

answered Sep 15 '15 at 15:24

EdChum

376,765
198
813
562

1

How would one add dropna=False to this? – branwen85 Oct 05 '20 at 12:03
1

na will not get dropped by default – Data Mastery Dec 27 '20 at 08:39
2

Same question, how would you add "normalize=True"? --- sorry found the solution: df.apply(pd.Series.value_counts, normalize=True) – Charlotte Deng Feb 13 '21 at 00:31

score 49 · Answer 2 · edited Jan 21 '20 at 23:34

There is actually a fairly interesting and advanced way of doing this problem with crosstab and melt

df = pd.DataFrame({'a': ['table', 'chair', 'chair', 'lamp', 'bed'],
                   'b': ['lamp', 'candle', 'chair', 'lamp', 'bed'],
                   'c': ['mirror', 'mirror', 'mirror', 'mirror', 'mirror']})

df

       a       b       c
0  table    lamp  mirror
1  chair  candle  mirror
2  chair   chair  mirror
3   lamp    lamp  mirror
4    bed     bed  mirror

We can first melt the DataFrame

df1 = df.melt(var_name='columns', value_name='index')
df1

   columns   index
0        a   table
1        a   chair
2        a   chair
3        a    lamp
4        a     bed
5        b    lamp
6        b  candle
7        b   chair
8        b    lamp
9        b     bed
10       c  mirror
11       c  mirror
12       c  mirror
13       c  mirror
14       c  mirror

And then use the crosstab function to count the values for each column. This preserves the data type as ints which wouldn't be the case for the currently selected answer:

pd.crosstab(index=df1['index'], columns=df1['columns'])

columns  a  b  c
index           
bed      1  1  0
candle   0  1  0
chair    2  1  0
lamp     1  2  0
mirror   0  0  5
table    1  0  0

Or in one line, which expands the column names to parameter names with ** (this is advanced)

pd.crosstab(**df.melt(var_name='columns', value_name='index'))

Also, value_counts is now a top-level function. So you can simplify the currently selected answer to the following:

df.apply(pd.value_counts)

crosstab is nice because it leaves you with integers, though we can also get that with `df.apply(pd.value_counts).fillna(0).astype(int)` — Kevin McDonough, Nov 02 '19 at 18:44

score 15 · Answer 3 · answered Oct 30 '20 at 00:37

15

To get the counts only for specific columns:

df[['a', 'b']].apply(pd.Series.value_counts)

where df is the name of your dataframe and 'a' and 'b' are the columns for which you want to count the values.

answered Oct 30 '20 at 00:37

mOna

2,341
9
36
60

Serge Tochilov · Answer 4 · 2020-10-31T16:38:17.263

The solution that selects all categorical columns and makes a dataframe with all value counts at once:

df = pd.DataFrame({
'fruits': ['apple', 'mango', 'apple', 'mango', 'mango', 'pear', 'mango'],
'vegetables': ['cucumber', 'eggplant', 'tomato', 'tomato', 'tomato', 'tomato', 'pumpkin'],
'sauces': ['chili', 'chili', 'ketchup', 'ketchup', 'chili', '1000 islands', 'chili']})

cat_cols = df.select_dtypes(include=object).columns.tolist()
(pd.DataFrame(
    df[cat_cols]
    .melt(var_name='column', value_name='value')
    .value_counts())
.rename(columns={0: 'counts'})
.sort_values(by=['column', 'counts']))

                            counts
column      value   
fruits      pear            1
            apple           2
            mango           4
sauces      1000 islands    1
            ketchup         2
            chili           4
vegetables  pumpkin         1
            eggplant        1
            cucumber        1
            tomato          4

score 7 · Answer 5 · edited Feb 24 '21 at 04:33

7

You can also try this code:

for i in heart.columns:
    x = heart[i].value_counts()
    print("Column name is:",i,"and it value is:",x)

edited Feb 24 '21 at 04:33

pppery

3,731
22
33
46

answered Mar 24 '20 at 10:19

Ajay Kumar

71
1
3

score 4 · Answer 6 · edited Sep 17 '21 at 12:10

4

Your solution wrapped in one line looks even simpler than using groupby, stacking etc:

pd.concat([df[column].value_counts() for column in df], axis = 1)

edited Sep 17 '21 at 12:10

Peter Csala

17,736
16
35
75

answered Sep 17 '21 at 11:27

Simon Osadchii

71
3

score 2 · Answer 7 · answered Sep 02 '21 at 19:35

2

You can use a lambda function:

df.apply(lambda x: x.value_counts())

answered Sep 02 '21 at 19:35

Mykola Zotko

15,583
3
71
73

score 2 · Answer 8 · answered Jun 26 '22 at 17:15

Applying the value_counts function gave be unexpected / not the most readable results. But this approach seems super simple and easy to read:

df[["col1", "col2", "col3"]].value_counts()

Here is an example of results if the cols have boolean values:

col1               col2         col3
False              False        False        1000
                   True         False        1000
True               False        False        1000
                                True         1000
                   True         False        1000
                                True         1000
dtype: int64

score 1 · Answer 9 · answered Aug 02 '20 at 21:24

1

This is what worked for me:

for column in df.columns:
     print("\n" + column)
     print(df[column].value_counts())

link to source

answered Aug 02 '20 at 21:24

jcdevilleres

471
5
8

I would enhance this answer by choosing how many of the top values you'd like to see (5 in my example), and by including NaN values: `df[column].value_counts(dropna= False).nlargest(5)` – PJ_ Apr 28 '22 at 20:55

score 1 · Answer 10 · answered Aug 31 '22 at 19:47

you can list the column name

list = ["a", "b", "c", "d"]

then run a for loop using value_counts() function

for i in list:
  print(df[i].value_counts())
  print("\n")

you can also use this method given below

for column in df.columns:
 print("\n" + column)
 print(df[column].value_counts())

score 0 · Answer 11 · answered Jul 21 '20 at 00:46

Ran into this to see if there was a better way of doing what I was doing. Turns out calling df.apply(pd.value_counts) on a DataFrame whose columns each have their own many distinct values will result in a pretty substantial performance hit.

In this case, it is better to simply iterate over the non-numeric columns in a dictionary comprehension, and leave it as a dictionary:

types_to_count = {"object", "category", "string"}
result = {
    col: df[col].value_counts()
    for col in df.columns[df.dtypes.isin(types_to_count)]
}

The filtering by types_to_count helps to ensure you don't try to take the value_counts of continuous data.

score 0 · Answer 12 · answered May 18 '21 at 07:55

0

Another solution which can be done:

df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd'))
l1 = pd.Series()
for var in df.columns:
    l2 = df[var].value_counts()
    l1 = pd.concat([l1, l2], axis = 1)
l1

answered May 18 '21 at 07:55

Mujeebur Rahman

189
1
14

Abimael Domínguez · Answer 13 · 2022-05-03T06:03:20.987

Sometimes some columns are subsequent in hierarchy, in that case I recommend to "group" them and then make counts:

# note: "_id" is whatever column you have to make the counts with len()
cat_cols = ['column_1', 'column_2']
df.groupby(cat_cols).agg(count=('_id', lambda x: len(x)))

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>count</th>
    </tr>
    <tr>
      <th>column_1</th>
      <th>column_2</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="3" valign="top">category_1</th>
      <th>Excelent</th>
      <td>19</td>
    </tr>
    <tr>
      <th>Good</th>
      <td>11</td>
    </tr>
    <tr>
      <th>Bad</th>
      <td>1</td>
    </tr>
    <tr>
      <th rowspan="5" valign="top">category_2</th>
      <th>Happy</th>
      <td>48</td>
    </tr>
    <tr>
      <th>Good mood</th>
      <td>158</td>
    </tr>
    <tr>
      <th>Serious</th>
      <td>62</td>
    </tr>
    <tr>
      <th>Sad</th>
      <td>10</td>
    </tr>
    <tr>
      <th>Depressed</th>
      <td>8</td>
    </tr>
  </tbody>
</table>

Bonus: you can change len(x) to x.nunique() or other lambda functions you want.

Mi chael · Answer 14 · 2022-09-30T15:52:05.690

I thought it would be nice if it could be implemented in a way that works also for columns with different sets of values.

This code will generate a dataframe with hierarchical columns where the top column level signifies the column name from the original dataframe and at the lower level you get each two columns one for the values and one for the counts.

def val_cnts_df(df):
    val_cnts_dict = {}
    max_length = 0
    for col in df:
        val_cnts_dict[col] = df[col].value_counts()
        max_length = max(max_length, len(val_cnts_dict[col]))

    lists = [[col, prefix] for col in val_cnts_dict.keys() for prefix in ['values', 'counts']]
    columns = pd.MultiIndex.from_tuples(lists, names=['column', 'value_counts'])

    val_cnts_df = pd.DataFrame(data=np.zeros((max_length, len(columns))), columns=columns)

    for col in val_cnts_dict:
        val_cnts_df[col, 'values'] = val_cnts_dict[col].reset_index()['index']
        val_cnts_df[col, 'counts'] = val_cnts_dict[col].reset_index()[col]
        
    return val_cnts_df

Example of results:

autos = pd.DataFrame({'brand': ['Audi', 'Audi', 'VW', 'BMW', 'VW', 'VW'],
                      'gearbox': ['automatic', 'automatic', 'manual', 'automatic',
                                  'manual', 'manual'],
                     'doors': [5, 5, 5, 2, 5, 5]})

print(val_cnts_df(autos))

column        brand            gearbox        doors       
value_counts  values counts    values counts  values counts
0                 VW     3  automatic    3.0    5.0    5.0
1               Audi     2     manual    3.0    2.0    1.0
2                BMW     1        NaN    NaN    NaN    NaN
```

score -1 · Answer 15 · edited Mar 15 '23 at 18:50

-1

solution which can be done:

for count_unqiue in df1[df1.columns]: print(df1[count_unqiue].value_counts(dropna=False).sort_values(ascending=False).nlargest(5))

another appoach

for col in df1.columns[0:]:
    print(col, ': ', len(df1[col].unique()), 'labels')

edited Mar 15 '23 at 18:50

RusArtM

1,116
3
15
22

answered Mar 10 '23 at 06:47

ahmedawais

1
1

How to get value counts for multiple columns at once in Pandas DataFrame?

15 Answers15

Linked

Related