2

I need to count integers 1 to 5 row-wise in a Pandas dataframe. For example, for

import pandas as pd

df = pd.DataFrame({'c1': [3, 1, 2], 'c2': [3, 3, 3], 'c3': [2, 5, None], 'c4': [1, 2, 3]})

   c1  c2   c3  c4
0   3   3  2.0   1
1   1   3  5.0   2
2   2   3  NaN   3    

The following would be created:

   n1  n2  n3  n4  n5
0   1   1   2   0   0
1   1   1   1   0   1
2   0   1   2   0   0

I've come across .value_counts and crosstab, but I just haven't been able to set up either to get what I need. Any help would be much appreciated.

Thanks in advance!

Erfan
  • 40,971
  • 8
  • 66
  • 78
Dribbler
  • 4,343
  • 10
  • 33
  • 53

3 Answers3

5

What you could do is use the .apply function on the data frame like this:

>>> df = pd.DataFrame({'c1': [3, 1, 2], 'c2': [3, 3, 3], 'c3': [2, 5, None], 'c4': [1, 2, 3]})
>>> df
   c1  c2   c3  c4
0   3   3  2.0   1
1   1   3  5.0   2
2   2   3  NaN   3
>>> df.apply(pd.Series.value_counts, axis=1)
   1.0  2.0  3.0  5.0
0  1.0  1.0  2.0  NaN
1  1.0  1.0  1.0  1.0
2  NaN  1.0  2.0  NaN

Enjoy (: and for reference- Count occurrences of items in Series in each row of a DataFrame

p.s- obviously if a number doesn't exist it wont create a table with alot of numbers that doesn't exist just to fill the table until the last number that do exist, the table will just contain numbers that at least have 1 appearance.

Yoel Nisanov
  • 984
  • 7
  • 16
  • 2
    Great answer. .fillna(0) at the end too – Chris Oct 12 '19 at 15:33
  • Thanks! Any way to include a column for 0 counts (in the example it would be the integer 4)? – Dribbler Oct 12 '19 at 15:34
  • 1
    Just like the guy above me replied, you could include a column for the 0 counts with the .reindex function, just figure out how to get the smallest and largest number in your data frame insert them into range like -- range(min, max) and then reindex your data frame noting that you're reindexing axis 1. If you could accept my answer i'll be greatful! – Yoel Nisanov Oct 12 '19 at 15:44
  • 1
    use reindex. See my answer – ansev Oct 12 '19 at 15:46
4

stack + get_dummies

df1 = pd.get_dummies(df.stack().astype(int)).sum(level=0)

# Cleans up names, types, missing
idx = range(1, 6)
df1 = df1.reindex(idx, axis=1).add_prefix('n').fillna(0, downcast='infer')

   n1  n2  n3  n4  n5
0   1   1   2   0   0
1   1   1   1   0   1
2   0   1   2   0   0
Community
  • 1
  • 1
ALollz
  • 57,915
  • 7
  • 66
  • 89
3

You need Series.value_counts + reindex:

new_df=( df.apply(lambda x: x.value_counts(),axis=1)
        .reindex(columns=np.arange(1,6).astype(int))
        .fillna(0).add_prefix('n') )
print(new_df)

    n1   n2   n3   n4   n5
0  1.0  1.0  2.0  0.0  0.0
1  1.0  1.0  1.0  0.0  1.0
2  0.0  1.0  2.0  0.0  0.0

ansev
  • 30,322
  • 5
  • 17
  • 31