1

I need to count the number of occurrences of a regex pattern in an entire dataframe.

Is the following the most efficient method?

df = df.applymap(str)
count = 0
for col in df:
  count = count + df[col].str.contains(regex_search, case=False).sum()

print('Count: ', count)
stajah
  • 11
  • 1
  • 1
    https://stackoverflow.com/questions/26640129/search-for-string-in-all-pandas-dataframe-columns-and-filter This might help! – panktijk Sep 25 '18 at 17:45
  • Thank you @panktijk! If I use `result = df.applymap(lambda x: bool(re.match(regex_search, str(x)))).sum().sum()` which gives me the output that I want is my method more or less efficient though, than the iteration I was using? My question is more about efficiency I guess. – stajah Sep 25 '18 at 18:14
  • There is already a string handling function str.startwith(), str.contains('^f') , str.extract, `import re regex_pat = re.compile(r'^.a|dog', flags=re.IGNORECASE)` The distinction between match and contains is strictness: match relies on strict re.match, while contains relies on re.search. Methods like match, contains, startswith, and endswith take an extra na argument so missing values can be considered True or False: – Karn Kumar Sep 25 '18 at 18:15
  • @stajah... Few readings `startswith()` Equivalent to str.startswith(pat) for each element `endswith()` Equivalent to str.endswith(pat) for each element `findall()` Compute list of all occurrences of pattern/regex for each string `match()` Call re.match on each element, returning matched groups as list `extract()` Call re.search on each element, returning DataFrame with one row for each element and one column for each regex capture group `extractall()` Call re.findall on each element, returning DataFrame with one row for each match and one column for each regex capture group – Karn Kumar Sep 25 '18 at 18:16

0 Answers0