Count unique strings by column in pandas DataFrame

Question

I need to find how many times one of my four unique strings occurs in each column of my dataframe.

does anyone know a formula that would work for this?

Please provide at least some kind of dataset with your attemps so we can have a minimal reproducible example. https://stackoverflow.com/help/minimal-reproducible-example — Celius Stingher, Sep 18 '19 at 18:59

Valdi_Bo · Answer 1 · 2019-09-18T19:37:53.830

Assume that the source DataFrame is as follows:

               Aaa                       Bbb               Ccc
0          Mad Max           Sleeping Beauty      Seven Dwarfs
1  Captain America     The Magnificent Seven         Absolvent
2        Toy Story  The Fast and the Furious         King Lion
3     The Fugitive                Robin Hood  The Seventh Seal

The list of words to look for is (I shortened it to 2):

words = ['the', 'seven']

Then, to generate your result, run:

pd.DataFrame([ [wrd] + [ df[col].str.extractall(f'(\\b{wrd}\\b)',
    flags=re.I).size for col in df.columns ] for wrd in words ],
    columns=['Word', *df])

Note \b (word boundary anchor) in regex, both before and after the word to look for. This ensures that if you look for word the, there will be found all cases of just the, leaving out e.g. such words like there, Athena and so on.

Note also re.I flag, to perform case insensitive search (you have to import re).

The result, for my sample data, is:

    Word  Aaa  Bbb  Ccc
0    the    1    3    1
1  seven    0    1    1

Massifox · Accepted Answer · 2019-09-18T19:05:59.693

0

Given the following dataframe:

df = pd.DataFrame({
    'B': ['a', 'a', 'c', 'd', 'a'],
    'C': ['aa', 'bb', '', 'dd', 'do'],
})
   B   C
0  a  aa
1  a  bb
2  c  cb
3  d  dd
4  a  do

value_counts method counts the occurrences of all values in column 'B':

df.B.value_counts()

a    3
d    1
c    1

edited Sep 18 '19 at 19:05

answered Sep 18 '19 at 18:59

Massifox

4,369
11
31

score 0 · Answer 3 · edited Sep 18 '19 at 21:07

0

value_counts docs

However, that function is only for a series so you would need to find a way to implement it across the columns you want unique value counts for.

This example for value counts of entire df

edited Sep 18 '19 at 21:07

Massifox

4,369
11
31

answered Sep 18 '19 at 19:00

Riley Shea

41
2
3

Count unique strings by column in pandas DataFrame

3 Answers3