-1

I need to filter strings that start with a word containing 3 or more characters, followed by exactly two words that have only one character. After these three words, anything can follow.

What I tried is this expression:

pattern = r'\w{3,}\s\w\s\w.*'

but it matches a string apple wrong a b c which is not correct (the word "wrong" has more than one char).

A complete example is here:

import pandas as pd

df = pd.DataFrame({'text': ['apple wrong', 'apple wrong b c','apple a b correct', 'apple a b c correct']})
pattern = r'\w{3,}\s\w\s\w.*'
matches = df['text'].str.contains(pattern, regex=True)
result = df[matches]
print(result)
EnesZ
  • 403
  • 3
  • 16

1 Answers1

1

Adding a ^ at the beginning should solve the problem. It makes sure that the pattern starts from the beginning.

pattern =  r'^\w{3,}\s\w\s\w.*' 
Abhyuday Vaish
  • 2,357
  • 5
  • 11
  • 27