I need to filter strings that start with a word containing 3 or more characters, followed by exactly two words that have only one character. After these three words, anything can follow.
What I tried is this expression:
pattern = r'\w{3,}\s\w\s\w.*'
but it matches a string apple wrong a b c
which is not correct (the word "wrong" has more than one char).
A complete example is here:
import pandas as pd
df = pd.DataFrame({'text': ['apple wrong', 'apple wrong b c','apple a b correct', 'apple a b c correct']})
pattern = r'\w{3,}\s\w\s\w.*'
matches = df['text'].str.contains(pattern, regex=True)
result = df[matches]
print(result)