I'm trying to a find a way to determine whether a string contains at least n number of character in a specific order.
I am processing an enormous amount of data written by hand and the amount of typos is pretty crazy.
I need to find text parts in a large string looking something like:
irrelevant text MONKEY, CHIMP: more irrelevant text
I need to find MONKEY, CHIMP:
The ways this is mistyped is pretty crazy. Here is an extra weird example:
MonKEY , CHIMp :
I've got to a point in my regex where I'm able to find all of these occurances. Probably not the nicest solution, but here it is:
(m|M)(o|O)(n|N)(k|K)(e|E)(y|Y),?\s+(c|C)(h|H)(i|I)(m|M)(p|P)(\s+)?:
Looks a bit weird but it works.
Unfortunately the weirdness does not stop here. I need to amend this regex so that it also allows for 1 missing letter in each word.
So I would need to amend this regex so it would also work for something like:
MonKEY , CIMp :
onKEY , ChIMp :
onKEY , CIMp :
I would think that there should be a way to tell the regex that it should require wordlength-1 exact number of characters to match.
Is there a simple way to do this?
I'm been looking into {4, } but I'm not sure this is the right direction or if it could be applied here.
Thank in advance, Peter