1

I have dataset that consist of hundreds of column, and thousands of row

In [119]:
df.columns
Out[119]:
Index(['column 1', 'column2',
       ...
       'column 100'],
      dtype='object', name='var_name')

Usually I did value_counts() for every single column to see the distribution.

In [121]:
a = df['column1'].value_counts()
In [122]:
a
Out[122]:
1     77494
2      5389
0      2016
3       878
Name: column 1, dtype: int64

But for this dataframe, if I did this for every columns, this will make my notebook very messy, how to automate this? Is there any function that help?

If you have other information, all my data is int64, but I hope the best answer can give solution that works in every cases. I want to make the solution answer in pandas dataframe.

Based on @MaxU suggestion, this is my version of simplified dataframe

df

id  column1  column2 column3
1         3        1       7
2         3        2       8
3         2        3       7
4         2        1       8
5         1        2       7

and my expected output is:

column 1   count
1          1
2          2
3          2
column 2   count
1          2
2          2
3          1
column 3   count
7          3
8          2
3          1
jpp
  • 159,742
  • 34
  • 281
  • 339
Nabih Bawazir
  • 6,381
  • 7
  • 37
  • 70

3 Answers3

1

You should use a for loop. Here all = list of all column names.

all = list(df)
for I in all:
   print(df[I].value_counts())
Susmit
  • 336
  • 1
  • 4
  • 12
1

I'd do it this way:

In [83]: df.drop('id',1).apply(lambda c: c.value_counts().to_dict())
Out[83]:
column1    {3: 2, 2: 2, 1: 1}
column2    {2: 2, 1: 2, 3: 1}
column3          {7: 3, 8: 2}
dtype: object

or:

In [84]: for c in df.drop('id',1):
    ...:     print(df[c].value_counts())
    ...:
3    2
2    2
1    1
Name: column1, dtype: int64   # <----- column name
2    2
1    2
3    1
Name: column2, dtype: int64
7    3
8    2
Name: column3, dtype: int64
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
1

You can produce your desired value_counts sequentially, convert to dataframes and write to csv:

import pandas as pd

with open('out.csv', 'w') as out:

    for col in df.columns[1:]:

        res = df[col].value_counts()\
                     .reset_index()\
                     .rename(columns={col: 'count', 'index': col})\

        res.to_csv(out, index=False)
jpp
  • 159,742
  • 34
  • 281
  • 339