How to make a simple frequency table in Pandas

Question

I need to count integers 1 to 5 row-wise in a Pandas dataframe. For example, for

import pandas as pd

df = pd.DataFrame({'c1': [3, 1, 2], 'c2': [3, 3, 3], 'c3': [2, 5, None], 'c4': [1, 2, 3]})

   c1  c2   c3  c4
0   3   3  2.0   1
1   1   3  5.0   2
2   2   3  NaN   3

The following would be created:

   n1  n2  n3  n4  n5
0   1   1   2   0   0
1   1   1   1   0   1
2   0   1   2   0   0

I've come across .value_counts and crosstab, but I just haven't been able to set up either to get what I need. Any help would be much appreciated.

Thanks in advance!

Yoel Nisanov · Answer 1 · 2019-10-12T15:36:19.087

5

What you could do is use the .apply function on the data frame like this:

>>> df = pd.DataFrame({'c1': [3, 1, 2], 'c2': [3, 3, 3], 'c3': [2, 5, None], 'c4': [1, 2, 3]})
>>> df
   c1  c2   c3  c4
0   3   3  2.0   1
1   1   3  5.0   2
2   2   3  NaN   3
>>> df.apply(pd.Series.value_counts, axis=1)
   1.0  2.0  3.0  5.0
0  1.0  1.0  2.0  NaN
1  1.0  1.0  1.0  1.0
2  NaN  1.0  2.0  NaN

Enjoy (: and for reference- Count occurrences of items in Series in each row of a DataFrame

p.s- obviously if a number doesn't exist it wont create a table with alot of numbers that doesn't exist just to fill the table until the last number that do exist, the table will just contain numbers that at least have 1 appearance.

edited Oct 12 '19 at 15:36

answered Oct 12 '19 at 15:30

Yoel Nisanov

984
7
16

2

Great answer. .fillna(0) at the end too – Chris Oct 12 '19 at 15:33
Thanks! Any way to include a column for 0 counts (in the example it would be the integer 4)? – Dribbler Oct 12 '19 at 15:34
1

Just like the guy above me replied, you could include a column for the 0 counts with the .reindex function, just figure out how to get the smallest and largest number in your data frame insert them into range like -- range(min, max) and then reindex your data frame noting that you're reindexing axis 1. If you could accept my answer i'll be greatful! – Yoel Nisanov Oct 12 '19 at 15:44
1

use reindex. See my answer – ansev Oct 12 '19 at 15:46

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

4

`stack` + `get_dummies`

df1 = pd.get_dummies(df.stack().astype(int)).sum(level=0)

# Cleans up names, types, missing
idx = range(1, 6)
df1 = df1.reindex(idx, axis=1).add_prefix('n').fillna(0, downcast='infer')

   n1  n2  n3  n4  n5
0   1   1   2   0   0
1   1   1   1   0   1
2   0   1   2   0   0

edited Jun 20 '20 at 09:12

Community

1
1

answered Oct 12 '19 at 15:36

ALollz

57,915
7
66
89

ansev · Answer 3 · 2019-10-12T16:07:04.640

3

You need Series.value_counts + reindex:

new_df=( df.apply(lambda x: x.value_counts(),axis=1)
        .reindex(columns=np.arange(1,6).astype(int))
        .fillna(0).add_prefix('n') )
print(new_df)

    n1   n2   n3   n4   n5
0  1.0  1.0  2.0  0.0  0.0
1  1.0  1.0  1.0  0.0  1.0
2  0.0  1.0  2.0  0.0  0.0

edited Oct 12 '19 at 16:07

answered Oct 12 '19 at 15:42

ansev

30,322
5
17
31

How to make a simple frequency table in Pandas

3 Answers3

stack + get_dummies

`stack` + `get_dummies`