1-mismatch regular expression

Question

Possible Duplicate:
Search for string allowing for one mismatch in any location of the string

I am given a string s and a string t. Is there a regular expression to find all occurrences of t inside s with at most one mismatched character. (At most one character from t is allowed to be substituted by another.)

Please add some example data to play with, it makes things easier — Robjong, Feb 21 '12 at 01:24
@Robjong: [This](http://stackoverflow.com/questions/2420412/search-for-string-allowing-for-one-mismatch-in-any-location-of-the-string) is what I'm trying to do. — Randomblue, Feb 21 '12 at 01:27
i'm no expert on this subject, but you may be able to use a variant of this that allows up to 1 mismatch: http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm — mpen, Feb 21 '12 at 01:46
@Randomblue - if you can cite another SO question that covers this exact same topic, shouldn't this be closed as a duplicate? I posted a couple of possible solutions in the other question, including a pyparsing solution that handles not just single mismatch, but also n-mismatch. Did you try any of those? — PaulMcG, Feb 21 '12 at 02:03
@PaulMcGuire: Yeah, it should. I only noticed after posting... — Randomblue, Feb 21 '12 at 02:06

score 1 · Answer 1 · answered Feb 21 '12 at 01:36

Yes, absolutely. For example, if t is "abcde", then one such regex is

.bcde|a.cde|ab.de|abc.e|abcd.

That said, this is almost certainly not the best or most efficient way to do this, especially if t is at all big. (If it is big, then you can improve its performance somewhat by reformulating it as

.bcde|a(?:.cde|b(?:.de|c(?:.e|d.)))

or perhaps as

a(?:b(?:c(?:d.|.e)|.de)|.cde)|.bcde

but it's still not the best approach.)

score 1 · Accepted Answer · answered Feb 21 '12 at 01:41

1

I wouldn't necessarily do this with regex. You can use Levenshtein distance.

>>> import Levenshtein
>>> s = "spam ham and eggs"
>>> t = "ram"
>>> for i,_ in enumerate(s): 
...   s_ = s[i:i+len(t)]
...   if Levenshtein.distance(s_, t) == 1:
...     print s_
... 
pam
ham

answered Feb 21 '12 at 01:41

wim

338,267
99
616
750

1

sounds suboptimal. Levenshtein will keep counting even after 1 error – mpen Feb 21 '12 at 01:44
yeah it will, but Levenshtein distance is pretty fast and it may not be performance critical code anyway. premature optimisation and all that. – wim Feb 21 '12 at 01:51
except that in his comments he said he was trying to do this: http://stackoverflow.com/questions/2420412/search-for-string-allowing-for-one-mismatch-in-any-location-of-the-string which is a rather large data set. i don't think this is premature. – mpen Feb 21 '12 at 02:26

1-mismatch regular expression

2 Answers2