2

I just learned how to match a string with a wildcard from (very helpful) Python wildcard search in string

Now I'm trying to match two strings that both have wildcards.

string1 = "spotify.us.*.uk"
string2 = "spotify.*.co.uk"

These two strings should be a match. Using * will be used as wildcard. My research online shows no solution. What I have so far (not working):

import re

string1 = "spotify.us.*.uk"
string2 = "spotify.*.co.uk"
r1 = string1.replace("*", ".*")
r2 = string2.replace("*", ".*")
regex1 = re.compile('.*'+r1)
regex2 = re.compile('.*'+r2)

matches = re.search(regex1, regex2)

I used the same concept to match a string and a regex which was a working. But it doesn't work in this case where both string have wildcards. Any help would be much appreciated.

Community
  • 1
  • 1
misterbear
  • 803
  • 2
  • 13
  • 33

1 Answers1

2

In fact those two strings should not be a match, because a regular expression is always comparing a pattern to a string. There's no such thing as evaluating whether a pattern matches another pattern, outside of validating if both patterns match a common string -- it's just outside the expressiveness of the language.

re.search() takes as the first argument a pattern (as a compiled pattern or pattern string) and as the second argument a string. It returns whether that pattern matched on that string. Passing two pattern will throw a TypeError

Now assuming you pass in re.search(regex1, r2) it will not error out, but will not find a match. Why? Well regex1='.*spotify.us..*.uk' (in terms of expressiveness) which means "match anything containing any number of non-newline characters, followed by spotify, followed by any single (non-newline) character, followed by us, followed by two or more non-newline characters followed by uk. Indeed, the literal string spotify..*.co.uk does not match this description.

Aside:

Considering that . means match any non-newline character and \. means match a literal dot, you probably wanted something like:

regex1 = "spotify\.us\..*\.uk"
regex2 = "spotify\..*\.co\.uk"

Aside #2:

If you are only using wildcards, fnmatch (e.g. glob-style matching) is sufficient to express the same patterns and looks a lot cleaner in this case:

regex1 = "spotify.us.*.uk"
regex2 = "spotify.*.co.uk"
fnmatch.fnmatch('spotify.us.foo.uk', regex1)
# Output: True
lemonhead
  • 5,328
  • 1
  • 13
  • 25