23

I made a little generator function for character ranges:

>>> def crange(start, end):
...     for i in range(ord(start), ord(end)+1):
...             yield chr(i)
...

And then I can do this:

>>> print(*crange('a','e'))
a b c d e

Yay! But this doesn't work:

>>> crange('a','e')[::2]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'generator' object is not subscriptable

And this works, but is O(n), unlike range's O(1):

>>> 'y' in crange('a','z')
True

That means it takes about 0.35 seconds to search for character number 109,999 out of the maximum of 110,000. 109999 in range(110000) is, of course, fast.

At that point, my first thought was to simply subclass range. Unfortunately:

>>> class A(range):
...     pass
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: type 'range' is not an acceptable base type

So I guess I would have to mimic it in some way that allows me to pass characters as arguments, works like range internally, and produces characters. Unfortunately, I'm not sure how to proceed. I tried a dir():

>>> print(*dir(range), sep='\n')
__class__
__contains__
__delattr__
__dir__
__doc__
__eq__
__format__
__ge__
__getattribute__
__getitem__
__gt__
__hash__
__init__
__iter__
__le__
__len__
__lt__
__ne__
__new__
__reduce__
__reduce_ex__
__repr__
__reversed__
__setattr__
__sizeof__
__str__
__subclasshook__
count
index
start
step
stop

which lets me see what functions are in there, but I'm not sure what they're doing, or how range uses them. I looked for the source for range, but it's in C, and I don't know where to find its Python wrapper (it does have one, right?).

Where do I go from here, and should I even go there?

TigerhawkT3
  • 48,464
  • 6
  • 60
  • 97
  • 1
    "but it's in C, and I don't know where to find its Python wrapper (it does have one, right?)" - nope. Like `list` or `dict`, no part of `range` is written in Python. – user2357112 May 21 '15 at 01:00
  • About the class, perhaps `class A(object, range):` – Zizouz212 May 21 '15 at 01:04
  • 3
    While this might seem like an interesting question, I'm voting to close it as too broad. There's no good way to do this except by going through `range`'s entire API and replicating it, so answers would have to describe everything `range` does, all the hooks used to customize `len` and slicing and everything, and they'd take like 10 pages. I recommend googling the method names and looking through the [Python data model](https://docs.python.org/3/reference/datamodel.html). – user2357112 May 21 '15 at 01:09
  • 1
    alternative to existing version: `def crange(start, end): return map(chr, range(ord(start), ord(end)))` – Ryan Haining May 21 '15 at 01:14
  • 1
    I would have to disagree that it's too broad. The OP already has a core method, what he needs to do is implement a lazy list, which is essentially how `range()` functions. Once he has done that, he can work to implement the remainder of the range API and ask specific questions regarding them. The OP may want to take a look at the [`lazyarray`](https://pypi.python.org/pypi/lazyarray) module on PyPi to start with. I would strongly suggest that the OP narrow his question down to lazy list implementation as quickly as possible to avoid being closed. – Deacon May 21 '15 at 01:35

2 Answers2

18

At that point, my first thought was to simply subclass range.

range was a function in Python2 and a "final" class in Python3 (more info here) - in both cases not something you can sub-class. You will need to create a class crange that extends from an object as the base type.

class crange(object):

And this works, but is O(n), unlike range's O(1)

In Python 3, there is a __contains__ method that you will define for your object.

For objects that don’t define __contains__(), the membership test first tries iteration via __iter__(), then the old sequence iteration protocol via __getitem__(), see this section in the language reference.

This allows Python to determine if the value is in your range without actually enumerating the range.

For a simple example, if your range is 1 to 1,000,000, it is trivial to determine whether 23546 is in that range (1 < 23546 < 1000000). Of course the actual implementation is a bit more complex and adds ability to handle step increments etc.

Regarding:

Yay! But this doesn't work: >>> crange('a','e')[::2]

In this case you need to define __getitem__ on your object. Here's an example of some of the methods required:

class crange(object):
    def __init__(self, start, end, step=1):
        # initialize your range object
        self.start = start
        self.end = end
        self.step = step

    def __iter__(self):
        # enable iteration over your object
        # (assume step size is 1)
        for i in range(ord(self.start), ord(self.end)+1):
            yield chr(i)

    def __getitem__(self, i):
        # enable accessing items in your range by index
        # also enable crange('a','e')[::2]
        # (assuming step size of 1)
        if isinstance( i, slice ):
            # implement slicing 
        else:
            return chr(ord(self.start) + i)

    def __contains__(self, char):
        # enable O(1) determination of whether a value is in your range
        # (assume step size is 1)
        return ord(self.start) <= ord(char) < ord(self.end)

    def __len__(self):
        # return length (assuming step size of 1)
        return ord(self.end) - ord(self.start)
Community
  • 1
  • 1
Martin Konecny
  • 57,827
  • 19
  • 139
  • 159
  • If I remember, correctly, `__contains__()` is implemented in `range()` as: `if step == 1: return start <= num < end` / `else: return start <= num < end and not (num - start) % step` (not sure why I know that). The analogous function for `crange()` would be: `if step == 1: return ord(start) <= ord(c) < ord(end)` / `else: return ord(start) <= ord(c) < ord(end) and not (ord(c) - ord(start)) % step`. – Deacon May 21 '15 at 01:58
  • 1
    @DougR.thx I filled in some of the methods, and used your `__contains__()` method – Martin Konecny May 21 '15 at 02:02
  • Glad to be of help. Sorry it took me three times to get the whole thing right. This is the kind of problem I like to see here on SO. – Deacon May 21 '15 at 02:03
  • Your `__getitem__()` function appears to ignore the step. I.e., if we did `my_crange = crange('a', 'm', 3)`, `my_crange[1] == 'd'`. Your `__getitem__()` would return 'b'. I think it should be `return chr(ord(self.start) + i * step)` – Deacon May 21 '15 at 02:08
  • ...and I can't believe I made such a rookie mistake. Looking back on my return value for `__contains__()`, no need to do an `if/else...`. `n % 1 == 0` for any integer. – Deacon May 21 '15 at 02:14
  • Minor quibble: `range` in Python 3 is a class, not a function, although that doesn't change the fact that it is unsubclassable. – chepner May 21 '15 at 03:14
  • @chepner - thx for the correction, found it's an immutable sequence type. It was a function in Python 2. – Martin Konecny May 21 '15 at 03:16
  • "range was a function in Python2 and an immutable sequence-type in Python3 - in both cases not something you can sub-class." - the fact that it can't be subclassed in Python 3 has nothing to do with the fact that it's an immutable sequence type. After all, you can subclass `tuple` just fine. You can't subclass `range` because it specifically disallows subclassing (by not setting the Py_TPFLAGS_BASETYPE flag). – user2357112 May 21 '15 at 04:55
11

To add to Martin Konecny's answer. You probably want to use an internal range for everything and convert between chr and ord.

class crange:
    def __init__(self, *args, **kwargs):
        args = [ord(arg) for arg in args]
        kwargs = {key: ord(val) for key, val in kwargs.items()}
        self.range = range(*args, **kwargs)

    def __iter__(self):
        for n in self.range:
            yield chr(n)

    def __contains__(self, c):
        return ord(c) in self.range

    def __getitem__(self, i):
        if isinstance(i, slice):
            ret = crange('\x00')
            ret.range = self.range[i]
            return ret
        else:
            return chr(self.range[i])

    def __repr__(self):
        return  "crange({}, {})".format(
            repr(chr(self.range.start)), repr(chr(self.range.stop)))

r = crange('a', 'f')
print(list(r))
print('b' in r)
print('f' in r)
print(r[:2])

In other words: if we can't subclass it we can use object composition.

Ariakenom
  • 204
  • 1
  • 5
  • I was about to ask if it would be a good idea to use a `range` object internally... :) – TigerhawkT3 May 21 '15 at 18:13
  • 1
    `self.range.__contains__(ord(c))` → `ord(c) in self.range`; `self.range.__getitem__(i)`→ `self.range[i]`, `('crange('+repr(chr(self.range.start))+', '+ repr(chr(self.range.stop))+')')` → `"crange({}, {})".format(self.range.start, self.range.stop)`. – Veedrac May 21 '15 at 19:10
  • Suggested changes made, thanks. Got myself stuck on using the method name for symmetry. repr was a mess, no excuse :(. – Ariakenom May 21 '15 at 21:44