Using regex and its flags with .eq() function in python to ignore case

Question

I've checked the docs (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.eq.html)

I'm thinking something like below where I can use and re.I to ingnore case or use any other flag for that matter.

df.column.eq('Male').sum()

Use `df.column.str.match()`: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.match.html — Jan Wilamowski, Feb 02 '22 at 07:51
You probably need `df['column'].str.contains('Male', case=False).sum()` — Wiktor Stribiżew, Feb 02 '22 at 19:16

score 1 · Accepted Answer · answered Feb 02 '22 at 19:18

1

You can use the Series.str.contains function with case=False argument, ^Male$ as regex pattern and the regex=True argument:

df['column'].str.contains('^Male$', case=False, regex=True).sum()

See the Series.str.contains documentation.

Also, see What do ^ and $ mean in a regular expression?

answered Feb 02 '22 at 19:18

Wiktor Stribiżew

607,720
39
448
563

So in df .contains, `regex=True` is the default, not necessary, that's what I was trying to find out, thanks for the documentation link – gseattle May 08 '22 at 00:14

kaiinge · Answer 2 · 2023-04-12T08:00:03.807

Note that an alternative to setting case=False you can allow different case setting in words using a character set in the regex (ie. '^[Mm]ale$').

import pandas as pd
pupils = [1, 2, 3, 4, 5, 6, 7, 8]
test_outcomes =['pass', 'fail', 'pass', 'fail', 'not passed', 'fail', 'fail', 'Pass']
test_results = pd.DataFrame(zip(pupils, test_outcomes), columns['pupil','outcome'])
passes = test_results[test_results['outcome'].str.contains('^[Pp]ass', regex=True)]

pupil	outcome
1	pass
3	pass
8	Pass

Using regex and its flags with .eq() function in python to ignore case

2 Answers2