I'm trying to find the location of a substring within a string that contains wildcards. For example:
substring = 'ABCDEF'
large_string = 'QQQQQABC.EFQQQQQ'
start = string.find(substring, large_string)
print(start)
5
thank you in advance
I'm trying to find the location of a substring within a string that contains wildcards. For example:
substring = 'ABCDEF'
large_string = 'QQQQQABC.EFQQQQQ'
start = string.find(substring, large_string)
print(start)
5
thank you in advance
The idea is to convert what you are looking for, ABCDEF
in this case, into the following regular expression:
([A]|\.)([B]|\.)([C]|\.)([D]|\.)([E]|\.)([F]|\.)
Each character is placed in []
in case it turns out to be a regex special character. The only complication is if one of the search characters is ^
, as in ABCDEF^
. The ^
character should just be escaped and is therefore handled specially.
Then you search the string for that pattern using re.search
:
import re
substring = 'ABCDEF'
large_string = 'QQQQQABC.EF^QQQQQ'
new_substring = re.sub(r'([^^])', r'([\1]|\\.)', substring)
new_substring = re.sub(r'\^', r'(\\^|\\.)', new_substring)
print(new_substring)
regex = re.compile(new_substring)
m = regex.search(large_string)
if (m):
print(m.span())
Prints:
([A]|\.)([B]|\.)([C]|\.)([D]|\.)([E]|\.)([F]|\.)
(5, 11)
Not sure if there is a regex operation for this, but you can generate a list of regex patterns that will work.
substring = "ABCDE"
patterns = []
for i in range(len(substring)):
patterns.append(string[:i]+'.?' + string[i:])
This gives you the following output in our example:
.?abcde
a.?bcde
ab.?cde
abc.?de
abcd.?e
With this list you can now find the index
for pattern in patterns:
try:
print("Index is" + re.search(pattern,substring).start())
break
excpect AttributeError:
pass
else:
print("Not found")
```python
my try:
from itertools import combinations
def gen_wild_cards(string):
list_ = []
start_indexes = [i for i in range(len(string))]
for i in range(1, len(string)):
combs = [v for v in combinations(start_indexes, i)]
for c in combs:
new_string = list(string)
for index in c:
new_string[index] = "."
list_.append("".join(new_string))
return list_
large_string = 'QQQQQABC.EFQQQQQ'
basic_string = "ABCDEF"
list_ = gen_wild_cards(basic_string)
for wildcard in list_:
print(large_string.find(wildcard))
basically I am generating all the wildcards and searching all of them trough the large_string. The wildcard generated:
.BCDEF
A.CDEF
AB.DEF
ABC.EF
ABCD.F
ABCDE.
..CDEF
.B.DEF
.BC.EF
.BCD.F
.BCDE.
A..DEF
A.C.EF
A.CD.F
A.CDE.
AB..EF
AB.D.F
AB.DE.
ABC..F
ABC.E.
ABCD..
...DEF
..C.EF
..CD.F
..CDE.
.B..EF
.B.D.F
.B.DE.
.BC..F
.BC.E.
.BCD..
A...EF
A..D.F
A..DE.
A.C..F
A.C.E.
A.CD..
AB...F
AB..E.
AB.D..
ABC...
....EF
...D.F
...DE.
..C..F
..C.E.
..CD..
.B...F
.B..E.
.B.D..
.BC...
A....F
A...E.
A..D..
A.C...
AB....
.....F
....E.
...D..
..C...
.B....
A.....
If you are interested only in the first match, you could use a lazy approach with a generator instead of generating all the wildcards in one shot