0

I have a dataframe with log error messages. The column we need looks something like this:

message
 
"System error foo"  
"System error foo2"    
"System error foo"    
"System error foo"   
"System error foo3"

I need to count all error messages, doesn't matter what kind of error they are.

Usually, if I knew a specific message, I'd filter a dataframe like this:

df2 = df[df['message'] == 'System error foo3.']

But how can I do this with all the messages that just contain "System error" plus whatever else goes after it? I tried it with the asterix, but it didn't work of course. Is there some sort of python or pandas native wildcard operator? Or do I need to use regex?

miatochai
  • 343
  • 3
  • 15

1 Answers1

2

You can use contains

import pandas as pd

>>> df = pd.DataFrame(data=["System Error foo 1","System Error bar 2","System Error foo3","Error bar"],columns=["messages"])
>>> df
             messages
0  System Error foo 1
1  System Error bar 2
2   System Error foo3
3           Error bar
>>> df[df['messages'].str.contains('System Error')]
             messages
0  System Error foo 1
1  System Error bar 2
2   System Error foo3


Subbu VidyaSekar
  • 2,503
  • 3
  • 21
  • 39
Vaebhav
  • 4,672
  • 1
  • 13
  • 33