I need to find how many times one of my four unique strings occurs in each column of my dataframe.
does anyone know a formula that would work for this?
I need to find how many times one of my four unique strings occurs in each column of my dataframe.
does anyone know a formula that would work for this?
Assume that the source DataFrame is as follows:
Aaa Bbb Ccc
0 Mad Max Sleeping Beauty Seven Dwarfs
1 Captain America The Magnificent Seven Absolvent
2 Toy Story The Fast and the Furious King Lion
3 The Fugitive Robin Hood The Seventh Seal
The list of words to look for is (I shortened it to 2):
words = ['the', 'seven']
Then, to generate your result, run:
pd.DataFrame([ [wrd] + [ df[col].str.extractall(f'(\\b{wrd}\\b)',
flags=re.I).size for col in df.columns ] for wrd in words ],
columns=['Word', *df])
Note \b
(word boundary anchor) in regex, both before and after the
word to look for.
This ensures that if you look for word the, there will be found all
cases of just the, leaving out e.g. such words like there, Athena
and so on.
Note also re.I
flag, to perform case insensitive search (you have to
import re).
The result, for my sample data, is:
Word Aaa Bbb Ccc
0 the 1 3 1
1 seven 0 1 1
Given the following dataframe:
df = pd.DataFrame({
'B': ['a', 'a', 'c', 'd', 'a'],
'C': ['aa', 'bb', '', 'dd', 'do'],
})
B C
0 a aa
1 a bb
2 c cb
3 d dd
4 a do
value_counts method counts the occurrences of all values ββin column 'B
':
df.B.value_counts()
a 3
d 1
c 1
However, that function is only for a series so you would need to find a way to implement it across the columns you want unique value counts for.