How can I reformat value_counts() analysis in Pandas for large number of columns

Question

I have dataset that consist of hundreds of column, and thousands of row

In [119]:
df.columns
Out[119]:
Index(['column 1', 'column2',
       ...
       'column 100'],
      dtype='object', name='var_name')

Usually I did value_counts() for every single column to see the distribution.

In [121]:
a = df['column1'].value_counts()
In [122]:
a
Out[122]:
1     77494
2      5389
0      2016
3       878
Name: column 1, dtype: int64

But for this dataframe, if I did this for every columns, this will make my notebook very messy, how to automate this? Is there any function that help?

If you have other information, all my data is int64, but I hope the best answer can give solution that works in every cases. I want to make the solution answer in pandas dataframe.

Based on @MaxU suggestion, this is my version of simplified dataframe

df

id  column1  column2 column3
1         3        1       7
2         3        2       8
3         2        3       7
4         2        1       8
5         1        2       7

and my expected output is:

column 1   count
1          1
2          2
3          2
column 2   count
1          2
2          2
3          1
column 3   count
7          3
8          2
3          1

Have a look at this https://stackoverflow.com/questions/23197324/pandas-value-counts-applied-to-each-column — cardamom, Mar 21 '18 at 10:40
It'll depend on the way - you want to represent it... Can you add a small sample data set (with 2-3 columns) and your desired data set? — MaxU - stand with Ukraine, Mar 21 '18 at 11:04
We need more information: at a *minimum*, include your data types (e.g. `int`, `str`, `float`?), whether ordering is a requirement, any output requirement (e.g. nested `dict`, `pd.DataFrame`, `dict` of `pd.Series`, etc.). — jpp, Mar 21 '18 at 11:19

Susmit · Answer 1 · 2018-03-21T14:55:54.013

1

You should use a for loop. Here all = list of all column names.

all = list(df)
for I in all:
   print(df[I].value_counts())

edited Mar 21 '18 at 14:55

answered Mar 21 '18 at 10:51

Susmit

336
1
4
12

'print' is lowercase – Nabih Bawazir Mar 21 '18 at 12:01
Edited,. It was typed on mobile hence it got capitalised – Susmit Mar 21 '18 at 14:56

MaxU - stand with Ukraine · Accepted Answer · 2018-03-21T12:05:17.097

1

I'd do it this way:

In [83]: df.drop('id',1).apply(lambda c: c.value_counts().to_dict())
Out[83]:
column1    {3: 2, 2: 2, 1: 1}
column2    {2: 2, 1: 2, 3: 1}
column3          {7: 3, 8: 2}
dtype: object

or:

In [84]: for c in df.drop('id',1):
    ...:     print(df[c].value_counts())
    ...:
3    2
2    2
1    1
Name: column1, dtype: int64   # <----- column name
2    2
1    2
3    1
Name: column2, dtype: int64
7    3
8    2
Name: column3, dtype: int64

edited Mar 21 '18 at 12:05

answered Mar 21 '18 at 12:00

MaxU - stand with Ukraine

205,989
36
386
419

1

you will need `pd.set_option('display.max_colwidth',-1)' if data vary enough, but thanks, this is easier to look – Nabih Bawazir Mar 21 '18 at 12:03

jpp · Answer 3 · 2018-03-21T14:12:58.900

1

You can produce your desired value_counts sequentially, convert to dataframes and write to csv:

import pandas as pd

with open('out.csv', 'w') as out:

    for col in df.columns[1:]:

        res = df[col].value_counts()\
                     .reset_index()\
                     .rename(columns={col: 'count', 'index': col})\

        res.to_csv(out, index=False)

edited Mar 21 '18 at 14:12

answered Mar 21 '18 at 13:40

jpp

159,742
34
281
339

How can I reformat value_counts() analysis in Pandas for large number of columns

3 Answers3