Check which elements of series are substring of a given text - Python, Pandas

Question

What I am trying to do is:

Given a series with strings, to find all the indexes of the strings, that are substring of another main string, in a vectorize manner.

The Input:

series = pd.Series(['ab', 'abcd', 'bcc', 'abc'], name='text')
main_text = 'abcX'

# The series:
0      ab
1    abcd
2     bcc
3     abc
Name: text, dtype: object

The desired output:

0      ab
3     abc
Name: text, dtype: object

What I tried:

df_test = pd.DataFrame(series)
df_test['text2'] = main_text
df_test['text'].isin(df_test)

# And this of course won't work, since it check if the main string is a 
# substring of the series strings:
series.str.contains(main_text, regex=True)

Thanks!

score 0 · Answer 1 · answered Mar 04 '22 at 09:00

0

You don't need a regex, simply use in:

series[[e in main_text for e in series]]

output:

0     ab
3    abc
Name: text, dtype: object

answered Mar 04 '22 at 09:00

mozway

194,879
13
39
75

Yes but this is not efficient, I want to do it in a vectorize manner – Ilan12 Mar 04 '22 at 09:01
@Ilan12 you won't be able to do better with pandas, you can't vectorize here – mozway Mar 04 '22 at 09:02
maybe with apply it would be faster than with regular loop – Ilan12 Mar 04 '22 at 09:04
No, regular loops are most often faster than apply ;) see [here](https://stackoverflow.com/questions/38938318/why-apply-sometimes-isnt-faster-than-for-loop-in-pandas-dataframe) or [here](https://stackoverflow.com/questions/47749018/why-is-pandas-apply-lambda-slower-than-loop-here) for example. – mozway Mar 04 '22 at 09:06

Check which elements of series are substring of a given text - Python, Pandas

1 Answers1