Matching
So, from your question I believe that you are looking either for this
M.*?y.*?M.*?o.*?m.*?s.*?h.*?o.*?u.*?s.*?e
or
M[^a-zA-Z]*?y[^a-zA-Z]*?M[^a-zA-Z]*?o[^a-zA-Z]*?m[^a-zA-Z]*?s[^a-zA-Z]*?h[^a-zA-Z]*?o[^a-zA-Z]*?u[^a-zA-Z]*?s[^a-zA-Z]*?e
The first one matches the search string plus any characters in between the characters of the search string (as stated in your question body, see regex101), the second one does the same for non-alphabetic characters (as your question title suggests, see regex101).
Each of these is just built from the characters of the search string with a pattern to lazily match either any character (case 1) or any non-alphabetic character (case 2).
Note: If you want the second one to also exclude "special" word characters, like e.g. é
, ü
or ô
, you need to take care of them accordingly in the regex pattern that you use, e.g. by using the unicode category \P{L}
.
M\P{L}*?y\P{L}*?M\P{L}*?o\P{L}*?m\P{L}*?s\P{L}*?h\P{L}*?o\P{L}*?u\P{L}*?s\P{L}*?e
\p{L}
matches a single code point in the category "letter", and \P{L}
matches the opposite (see regex101).
Building the expression
Whatever your exact expression, you can easily build your final regex string by joining each character of your search string with the expression you choose to match content in between.
Python example
Here is a python example (since your question was not tagged with a programming language):
import regex
text = ["text 123 ->My Mom's house<- jidjio",
"bla bla ->My8Mo2ms231#43house<- bla bla",
"Test string ->My Mom's' house<- further text",
"wkashhasMdykMomLsfheoousssswQseBswenksd",
"textMy?M?om*s?*hou?*seorsomethingelse",
"thisIs3MôyMäoméshouseEFSAcasw!"]
search_string = "MyMomshouse"
regex_string = r'.*?'.join(str(c) for c in search_string)
regex_string2 = r'[^a-zA-Z]*?'.join(str(c) for c in search_string)
regex_string3 = r'\P{L}*?'.join(str(c) for c in search_string)
print('\n--- regex 1 ---')
for t in text:
print(regex.search(regex_string, t))
print('\n--- regex 2 ---')
for t in text:
print(regex.search(regex_string2, t))
print('\n--- regex 3 ---')
for t in text:
print(regex.search(regex_string3, t))
Output:
--- regex 1 ---
<regex.Match object; span=(11, 25), match="My Mom's house">
<regex.Match object; span=(10, 29), match='My8Mo2ms231#43house'>
<regex.Match object; span=(14, 29), match="My Mom's' house">
<regex.Match object; span=(8, 31), match='MdykMomLsfheoousssswQse'>
<regex.Match object; span=(4, 22), match='My?M?om*s?*hou?*se'>
<regex.Match object; span=(7, 21), match='MôyMäoméshouse'>
--- regex 2 ---
<regex.Match object; span=(11, 25), match="My Mom's house">
<regex.Match object; span=(10, 29), match='My8Mo2ms231#43house'>
<regex.Match object; span=(14, 29), match="My Mom's' house">
None
<regex.Match object; span=(4, 22), match='My?M?om*s?*hou?*se'>
<regex.Match object; span=(7, 21), match='MôyMäoméshouse'>
--- regex 3 ---
<regex.Match object; span=(11, 25), match="My Mom's house">
<regex.Match object; span=(10, 29), match='My8Mo2ms231#43house'>
<regex.Match object; span=(14, 29), match="My Mom's' house">
None
<regex.Match object; span=(4, 22), match='My?M?om*s?*hou?*se'>
None
Note:
- I used the python
regex
module instead of the re
module because it supports the \p{L}
pattern.
- If your search string includes characters that have a special meaning in regex, you need to escape them when building the pattern, e.g.
'.*?'.join(regex.escape(str(c)) for c in search_string)
- I used the search string
MyMomshouse
(no spaces) instead of the one you specified, since yours would not match in the second of your example strings.
JavaScript example:
The same is possible in JavaScript, or in principle, any language. See also this JS fiddle:
const text = ["text 123 ->My Mom's house<- jidjio",
"bla bla ->My8Mo2ms231#43house<- bla bla",
"Test string ->My Mom's' house<- further text",
"wkashhasMdykMomLsfheoousssswQseBswenksd",
"textMy?M?om*s?*hou?*seorsomethingelse",
"thisIs3MôyMäoméshouseEFSAcasw!"];
const search_string = "MyMomshouse";
const regex_string = Array.from(search_string).join('.*?')
console.log(regex_string)
text.forEach((entry) => {
console.log(entry.search(regex_string));
});
However the unicode character group support is not always available, see this SO questions and its answers for possible solutions.