6

I'm trying to find the location of a substring within a string that contains wildcards. For example:

substring = 'ABCDEF'
large_string = 'QQQQQABC.EFQQQQQ'

start = string.find(substring, large_string)
print(start)

5

thank you in advance

xygonyx
  • 115
  • 1
  • 8
  • Possible duplicate of [Python - Locating the position of a regex match in a string?](https://stackoverflow.com/questions/2674391/python-locating-the-position-of-a-regex-match-in-a-string) – Rahim Sep 14 '19 at 14:22

4 Answers4

2

The idea is to convert what you are looking for, ABCDEF in this case, into the following regular expression:

([A]|\.)([B]|\.)([C]|\.)([D]|\.)([E]|\.)([F]|\.)

Each character is placed in [] in case it turns out to be a regex special character. The only complication is if one of the search characters is ^, as in ABCDEF^. The ^ character should just be escaped and is therefore handled specially.

Then you search the string for that pattern using re.search:

import re

substring = 'ABCDEF'
large_string = 'QQQQQABC.EF^QQQQQ'

new_substring = re.sub(r'([^^])', r'([\1]|\\.)', substring)
new_substring = re.sub(r'\^', r'(\\^|\\.)', new_substring)
print(new_substring)
regex = re.compile(new_substring)
m = regex.search(large_string)
if (m):
    print(m.span())

Prints:

([A]|\.)([B]|\.)([C]|\.)([D]|\.)([E]|\.)([F]|\.)
(5, 11)
Booboo
  • 38,656
  • 3
  • 37
  • 60
0

Not sure if there is a regex operation for this, but you can generate a list of regex patterns that will work.

substring = "ABCDE"
patterns = []
for i in range(len(substring)):
    patterns.append(string[:i]+'.?' + string[i:])

This gives you the following output in our example:

.?abcde
a.?bcde
ab.?cde
abc.?de
abcd.?e

With this list you can now find the index

for pattern in patterns:
   try:
      print("Index is" + re.search(pattern,substring).start())
      break
   excpect AttributeError:
      pass
else:
   print("Not found")
```python

Ron Serruya
  • 3,988
  • 1
  • 16
  • 26
0

my try:

from itertools import combinations

def gen_wild_cards(string):
    list_ = []
    start_indexes = [i for i in range(len(string))]
    for i in range(1, len(string)):
        combs = [v for v in combinations(start_indexes, i)]
        for c in combs:
            new_string = list(string)
            for index in c:
                new_string[index] = "."
            list_.append("".join(new_string))
    return list_

large_string = 'QQQQQABC.EFQQQQQ'
basic_string = "ABCDEF"
list_ = gen_wild_cards(basic_string)
for wildcard in list_:
    print(large_string.find(wildcard))

basically I am generating all the wildcards and searching all of them trough the large_string. The wildcard generated:

.BCDEF
A.CDEF
AB.DEF
ABC.EF
ABCD.F
ABCDE.
..CDEF
.B.DEF
.BC.EF
.BCD.F
.BCDE.
A..DEF
A.C.EF
A.CD.F
A.CDE.
AB..EF
AB.D.F
AB.DE.
ABC..F
ABC.E.
ABCD..
...DEF
..C.EF
..CD.F
..CDE.
.B..EF
.B.D.F
.B.DE.
.BC..F
.BC.E.
.BCD..
A...EF
A..D.F
A..DE.
A.C..F
A.C.E.
A.CD..
AB...F
AB..E.
AB.D..
ABC...
....EF
...D.F
...DE.
..C..F
..C.E.
..CD..
.B...F
.B..E.
.B.D..
.BC...
A....F
A...E.
A..D..
A.C...
AB....
.....F
....E.
...D..
..C...
.B....
A.....

If you are interested only in the first match, you could use a lazy approach with a generator instead of generating all the wildcards in one shot

Nikaido
  • 4,443
  • 5
  • 30
  • 47
-1

You can use index() or .start() from re

index = large_string.index(substring)
print(index)
index = re.search(substring, large_string).start()
print(index)
Rahim
  • 146
  • 1
  • 8