String Matching with wildcard in Python

Question

I'm trying to find the location of a substring within a string that contains wildcards. For example:

substring = 'ABCDEF'
large_string = 'QQQQQABC.EFQQQQQ'

start = string.find(substring, large_string)
print(start)

5

thank you in advance

Possible duplicate of [Python - Locating the position of a regex match in a string?](https://stackoverflow.com/questions/2674391/python-locating-the-position-of-a-regex-match-in-a-string) — Rahim, Sep 14 '19 at 14:22

Booboo · Accepted Answer · 2019-09-14T15:24:26.960

The idea is to convert what you are looking for, ABCDEF in this case, into the following regular expression:

([A]|\.)([B]|\.)([C]|\.)([D]|\.)([E]|\.)([F]|\.)

Each character is placed in [] in case it turns out to be a regex special character. The only complication is if one of the search characters is ^, as in ABCDEF^. The ^ character should just be escaped and is therefore handled specially.

Then you search the string for that pattern using re.search:

import re

substring = 'ABCDEF'
large_string = 'QQQQQABC.EF^QQQQQ'

new_substring = re.sub(r'([^^])', r'([\1]|\\.)', substring)
new_substring = re.sub(r'\^', r'(\\^|\\.)', new_substring)
print(new_substring)
regex = re.compile(new_substring)
m = regex.search(large_string)
if (m):
    print(m.span())

Prints:

([A]|\.)([B]|\.)([C]|\.)([D]|\.)([E]|\.)([F]|\.)
(5, 11)

score 0 · Answer 2 · answered Sep 14 '19 at 14:19

Not sure if there is a regex operation for this, but you can generate a list of regex patterns that will work.

substring = "ABCDE"
patterns = []
for i in range(len(substring)):
    patterns.append(string[:i]+'.?' + string[i:])

This gives you the following output in our example:

.?abcde
a.?bcde
ab.?cde
abc.?de
abcd.?e

With this list you can now find the index

for pattern in patterns:
   try:
      print("Index is" + re.search(pattern,substring).start())
      break
   excpect AttributeError:
      pass
else:
   print("Not found")
```python

Nikaido · Answer 3 · 2019-09-14T14:48:12.850

my try:

from itertools import combinations

def gen_wild_cards(string):
    list_ = []
    start_indexes = [i for i in range(len(string))]
    for i in range(1, len(string)):
        combs = [v for v in combinations(start_indexes, i)]
        for c in combs:
            new_string = list(string)
            for index in c:
                new_string[index] = "."
            list_.append("".join(new_string))
    return list_

large_string = 'QQQQQABC.EFQQQQQ'
basic_string = "ABCDEF"
list_ = gen_wild_cards(basic_string)
for wildcard in list_:
    print(large_string.find(wildcard))

basically I am generating all the wildcards and searching all of them trough the large_string. The wildcard generated:

.BCDEF
A.CDEF
AB.DEF
ABC.EF
ABCD.F
ABCDE.
..CDEF
.B.DEF
.BC.EF
.BCD.F
.BCDE.
A..DEF
A.C.EF
A.CD.F
A.CDE.
AB..EF
AB.D.F
AB.DE.
ABC..F
ABC.E.
ABCD..
...DEF
..C.EF
..CD.F
..CDE.
.B..EF
.B.D.F
.B.DE.
.BC..F
.BC.E.
.BCD..
A...EF
A..D.F
A..DE.
A.C..F
A.C.E.
A.CD..
AB...F
AB..E.
AB.D..
ABC...
....EF
...D.F
...DE.
..C..F
..C.E.
..CD..
.B...F
.B..E.
.B.D..
.BC...
A....F
A...E.
A..D..
A.C...
AB....
.....F
....E.
...D..
..C...
.B....
A.....

If you are interested only in the first match, you could use a lazy approach with a generator instead of generating all the wildcards in one shot

Rahim · Answer 4 · 2019-09-14T14:21:19.530

-1

You can use index() or .start() from re

index = large_string.index(substring)
print(index)

index = re.search(substring, large_string).start()
print(index)

edited Sep 14 '19 at 14:21

answered Sep 14 '19 at 14:13

Rahim

146
1
8

String Matching with wildcard in Python

4 Answers4