This regex will do what you want:
r'http://download\d+\.mysite\.com/\w+/\w+/upload\.rar'
\d
matches digits, \w
matches alphanumerics (including underscore); the +
says to match one or more of the previous pattern. We use a \
in front of .com
and .rar
so that the .
is interpreted literally and not as a regex wildcard.
test
import re
p = re.compile(r'http://download\d+\.mysite\.com/\w+/\w+/upload\.rar')
table = [
'http://download2142.mysite.com/d0kz4p5p3uog/api60w0g1o1jil1/upload.rar',
'http://download2142.mysite.com/d0kz4p5p3uog/api60w0g1o1jil1/upload.raw',
'http://download123.mysite.com/456/789/upload.rar',
'http://downloadabc.mysite.com/def/ghi/upload.rar',
'http://download1234.mysite.com/def/ghi/upload.rar',
'http://download1234.mysite.org/def/ghi/upload.rar',
]
for s in table:
m = p.match(s)
print s, m is not None
output
http://download2142.mysite.com/d0kz4p5p3uog/api60w0g1o1jil1/upload.rar True
http://download2142.mysite.com/d0kz4p5p3uog/api60w0g1o1jil1/upload.raw False
http://download123.mysite.com/456/789/upload.rar True
http://downloadabc.mysite.com/def/ghi/upload.rar False
http://download1234.mysite.com/def/ghi/upload.rar True
http://download1234.mysite.org/def/ghi/upload.rar False
If the actual file name varies then you can use
r'http://download\d+\.mysite\.com/\w+/\w+/\w+\.rar'
or
r'http://download\d+\.mysite\.com/\w+/\w+/[a-z]+\.rar'
if the name will always be lowercase letters
BTW, it's generally not a good idea to parse HTML with regex, but if the page format is fixed and fairly simple you may be able to get away with it.