This sounds like homework... but so be it.
From what I understood, you need a parser for your range definition.
There you go:
def parseRange(rangeStr, i=0):
# Recursion anchor, return empty set if we're out of bounds
if i >= len(rangeStr):
return set()
# charSet will tell us later if we actually have a range here
charSet = None
# There can only be a range if we have more than 2 characters left in the
# string and if the next character is a dash
if i+2 < len(rangeStr) and rangeStr[i+1] == '-':
# We might have a range. Valid ranges are between the following pairs of
# characters
pairs = [('a', 'z'), ('A', 'Z'), ('0', '9')]
for lo, hi in pairs:
# We now make use of the fact that characters are comparable.
# Also the second character should come after the first, or be
# the same which means e.g. 'a-a' -> 'a'
if (lo <= rangeStr[i] <= hi) and \
(rangeStr[i] <= rangeStr[i+2] <= hi):
# Retreive the set with all chars from the substring
charSet = parseRange(rangeStr, i+3)
# Extend the chars from the substring with the ones in this
# range.
# `range` needs integers, so we transform the chars to ints
# using ord and make use of the fact that their ASCII
# representation is ascending
charSet.update(chr(k) for k in
range(ord(rangeStr[i]), 1+ord(rangeStr[i+2])))
break
# If charSet is not yet defined this means that at the current position
# there is not a valid range definition. So we just get all chars for the
# following subset and add the current char
if charSet is None:
charSet = parseRange(rangeStr, i+1)
charSet.add(rangeStr[i])
# Return the char set with all characters defined within rangeStr[i:]
return charSet
It might not be the most elegant parser but it works.
Also you have to strip the square brackets when calling it, but you can do that easily e.g. with slicing [1:-1].
Another very short, dump and easy solution using the parser from re
is this:
def parseRangeRe(rangeStr):
master_pattern = "1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-"
matcher = re.compile(rangeStr)
return set(matcher.findall(master_pattern))