-5

I need to find how many times one of my four unique strings occurs in each column of my dataframe.

does anyone know a formula that would work for this?

3 Answers3

1

Assume that the source DataFrame is as follows:

               Aaa                       Bbb               Ccc
0          Mad Max           Sleeping Beauty      Seven Dwarfs
1  Captain America     The Magnificent Seven         Absolvent
2        Toy Story  The Fast and the Furious         King Lion
3     The Fugitive                Robin Hood  The Seventh Seal

The list of words to look for is (I shortened it to 2):

words = ['the', 'seven']

Then, to generate your result, run:

pd.DataFrame([ [wrd] + [ df[col].str.extractall(f'(\\b{wrd}\\b)',
    flags=re.I).size for col in df.columns ] for wrd in words ],
    columns=['Word', *df])

Note \b (word boundary anchor) in regex, both before and after the word to look for. This ensures that if you look for word the, there will be found all cases of just the, leaving out e.g. such words like there, Athena and so on.

Note also re.I flag, to perform case insensitive search (you have to import re).

The result, for my sample data, is:

    Word  Aaa  Bbb  Ccc
0    the    1    3    1
1  seven    0    1    1
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
0

Given the following dataframe:

df = pd.DataFrame({
    'B': ['a', 'a', 'c', 'd', 'a'],
    'C': ['aa', 'bb', '', 'dd', 'do'],
})
   B   C
0  a  aa
1  a  bb
2  c  cb
3  d  dd
4  a  do

value_counts method counts the occurrences of all values ​​in column 'B':

df.B.value_counts()

a    3
d    1
c    1
Massifox
  • 4,369
  • 11
  • 31
0

value_counts docs

However, that function is only for a series so you would need to find a way to implement it across the columns you want unique value counts for.

This example for value counts of entire df

Massifox
  • 4,369
  • 11
  • 31
Riley Shea
  • 41
  • 2
  • 3