-1

What I am trying to do is:

Given a series with strings, to find all the indexes of the strings, that are substring of another main string, in a vectorize manner.

The Input:

series = pd.Series(['ab', 'abcd', 'bcc', 'abc'], name='text')
main_text = 'abcX'

# The series:
0      ab
1    abcd
2     bcc
3     abc
Name: text, dtype: object

The desired output:

0      ab
3     abc
Name: text, dtype: object

What I tried:

df_test = pd.DataFrame(series)
df_test['text2'] = main_text
df_test['text'].isin(df_test)

# And this of course won't work, since it check if the main string is a 
# substring of the series strings:
series.str.contains(main_text, regex=True)

Thanks!

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Ilan12
  • 81
  • 2
  • 11

1 Answers1

0

You don't need a regex, simply use in:

series[[e in main_text for e in series]]

output:

0     ab
3    abc
Name: text, dtype: object
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Yes but this is not efficient, I want to do it in a vectorize manner – Ilan12 Mar 04 '22 at 09:01
  • @Ilan12 you won't be able to do better with pandas, you can't vectorize here – mozway Mar 04 '22 at 09:02
  • maybe with apply it would be faster than with regular loop – Ilan12 Mar 04 '22 at 09:04
  • No, regular loops are most often faster than apply ;) see [here](https://stackoverflow.com/questions/38938318/why-apply-sometimes-isnt-faster-than-for-loop-in-pandas-dataframe) or [here](https://stackoverflow.com/questions/47749018/why-is-pandas-apply-lambda-slower-than-loop-here) for example. – mozway Mar 04 '22 at 09:06