Consider this simple setup
import pandas as pd
df = pd.DataFrame({'id' : [1,2,3],
'text' : ['stack-overflow',
'slack-overflow',
'smack-over']})
df
Out[9]:
id text
0 1 stack-overflow
1 2 slack-overflow
2 3 smack-over
I have a given regex, and I would like to extract the longest match. I know I can use str.extractall
to get all the matches, but how can I get the longest one efficiently (as a column df['mylongest']
in the dataframe)?
Of course, in this example the longest matches are overflow, overflow and smack.
df.text.str.findall(r'(\w+)')
Out[10]:
0 [stack, overflow]
1 [slack, overflow]
2 [smack, over]
Name: text, dtype: object