Pandas find super string in one Series from another Series

Question

This does not need to necessarily be done in pandas but it would be nice if it could be done in pandas.

Say I have a list or Series of strings:

['XXY8779','0060-19','McChicken','456728']

And I have another list or Series which contains sub-strings of the original like so:

['60-19','Chicken','8779','1124231','92871','johnson']

And this would return something like:

[True, True, True, False]

I'm looking for a match that is something like:

^[a-zA-Z0-9.,$;]+ < matching string in other list >

So in other words, something that starts with 1 or more of any character but the rest matches exactly with one of the strings in my other list.

Does anyone have any ideas on the best way to accomplish this?

Thanks!

Will the matches always be with the end of the strings in your first list? — ALollz, Jun 28 '18 at 14:14
Related: the standard Pandas algorithm is not particularly efficient. If you need performance, consider a trie-based method, e.g. [this Aho-Corassick solution](https://stackoverflow.com/a/48600345/9209546). — jpp, Jun 28 '18 at 14:19

piRSquared · Accepted Answer · 2018-06-28T14:15:21.403

6

Use `str.contains`

'|'.join(s2) produces a string that tells contains to use regex and use or logic.

s1 = pd.Series(['XXY8779', '0060-19', 'McChicken', '456728'])

s2 = ['60-19', 'Chicken', '8779', '1124231', '92871', 'johnson']

s1.str.contains('|'.join(s2))

0     True
1     True
2     True
3    False
dtype: bool

edited Jun 28 '18 at 14:15

answered Jun 28 '18 at 14:13

piRSquared

285,575
57
475
624

`not (not (s1.str.contains('|'.join(s2))))` also works – ℕʘʘḆḽḘ Jun 28 '18 at 14:15
2

And if you do have characters that need escaping, `'|'.join(map(re.escape, s2))` – jpp Jun 28 '18 at 14:16
1

@ℕʘʘḆḽḘ That's just the double negation of what they wrote? – Graipher Jun 28 '18 at 14:37
1

@Graipher Noobs has a rare sense of humor that I appreciate (-: – piRSquared Jun 28 '18 at 14:38
:) hahaha indeed – ℕʘʘḆḽḘ Jun 28 '18 at 14:41

score 1 · Answer 2 · answered Jun 28 '18 at 14:27

Since it's always at the end you can use .str.endswith and any to short-circuit the logic. s1 and s2 are just your lists above (but it also works if they are pd.Series)

[any(i.endswith(j) for j in s2) for i in s1]
#[True, True, True, False]

You can then convert it to a series with pd.Series or just use that list as a mask as-is.

Pandas find super string in one Series from another Series

2 Answers2

Use `str.contains`

Linked

Pandas find super string in one Series from another Series

2 Answers2

Use str.contains

Linked

Use `str.contains`