Word count of single column in pandas dataframe

Question

Here is my attempt at a word count for a single column using group by with pandas :

First setup the data :

columns = ['col1','col2','col3']
data = np.array([['word1','word2','word3'] , ['word1','word5','word3'], ['word3','word7','word3']])
to_count = pd.DataFrame(data,columns=columns)

I'm attempting to count words in col1 in to_count.

to_count contains :

    col1   col2   col3
0  word1  word2  word3
1  word1  word5  word3
2  word3  word7  word3

To count the words I then use :

print(to_count.groupby('col1').count())

which displays :

col2  col3
col1             
word1     2     2
word3     1     1

This seems partly correct in that the word counts are returned but they are spread across multiple columns. How to access word count for a single column ? I could just access a single column in the word count dataframe but this does not seem correct.

This might also help - https://stackoverflow.com/q/46863602/4800652 — Bharath M Shetty, Dec 01 '17 at 17:04

score 2 · Accepted Answer · answered Dec 01 '17 at 16:50

2

If I understand you correctly, I think this is what you're looking for:

print(to_count.groupby('col1')['col1'].count())

Output:

       col1
word1    2
word3    1

answered Dec 01 '17 at 16:50

Joe T. Boka

6,554
6
29
48

score 1 · Answer 2 · answered Dec 01 '17 at 16:47

You can apply value_counts() fn to one column of dataframe. Following applies it all columns one by one:

for onecol in to_count:
    print(onecol, ":\n", to_count[onecol].value_counts())

Output:

col1 :
word1    2
word3    1
Name: col1, dtype: int64
col2 :
word5    1
word2    1
word7    1
Name: col2, dtype: int64
col3 :
word3    3
Name: col3, dtype: int64

Stefan Falk · Answer 3 · 2017-12-01T16:56:19.083

How about this:

Single column:

df['col1'].value_counts()

will return:

word1    2
word3    1

All columns:

df.apply(lambda col: col.value_counts()).fillna(0).astype(int)

will return:

       col1  col2  col3
word1     2     0     0
word2     0     1     0
word3     1     0     3
word5     0     1     0
word7     0     1     0

Copy & paste example:

from io import StringIO
import pandas as pd

data = """
    col1   col2   col3
0  word1  word2  word3
1  word1  word5  word3
2  word3  word7  word3
"""

df = pd.read_table(StringIO(data), sep='\s+')

print(df['col1'].value_counts())
print(df.apply(lambda col: col.value_counts().astype(int)).fillna(0).astype(int))

Word count of single column in pandas dataframe

3 Answers3

Linked