Filter a dataframe's string values based on a smaller string they might contain

Question

I have a dataframe with log error messages. The column we need looks something like this:

message
 
"System error foo"  
"System error foo2"    
"System error foo"    
"System error foo"   
"System error foo3"

I need to count all error messages, doesn't matter what kind of error they are.

Usually, if I knew a specific message, I'd filter a dataframe like this:

df2 = df[df['message'] == 'System error foo3.']

But how can I do this with all the messages that just contain "System error" plus whatever else goes after it? I tried it with the asterix, but it didn't work of course. Is there some sort of python or pandas native wildcard operator? Or do I need to use regex?

`df[df['message'].str.contains('System error')]`? – Dani Mesejo Jan 04 '21 at 10:57 — Dani Mesejo, Jan 04 '21 at 10:57

score 2 · Accepted Answer · edited Jan 04 '21 at 11:21

You can use contains

import pandas as pd

>>> df = pd.DataFrame(data=["System Error foo 1","System Error bar 2","System Error foo3","Error bar"],columns=["messages"])
>>> df
             messages
0  System Error foo 1
1  System Error bar 2
2   System Error foo3
3           Error bar
>>> df[df['messages'].str.contains('System Error')]
             messages
0  System Error foo 1
1  System Error bar 2
2   System Error foo3

Filter a dataframe's string values based on a smaller string they might contain

1 Answers1