10

I'm writing a set of test cases for users new to Python. One of the problems I've noticed with my tests is that its possible to get false positives. They may have gotten lucky and happened to give every element in the correct order, however they really should be using a structure that is ordered.

So far this is the best solution I could come up with.

self.assertTrue(isinstance(result, Sequence) or
                isinstance(result, GeneratorType) or
                callable(getattr(result, '__reversed__', False)))

However, I don't feel confident GeneratorType is really ordered, or that this test is comprehensive. I feel like there should be a better way to test for this. How do I test that a structure has order?

K Engle
  • 1,322
  • 2
  • 11
  • 23
  • 1
    It is quite hard to understand what you are after. Provide examples, please. By structure you mean `data type`? – Pynchia Oct 28 '15 at 21:43
  • I call a function and it returns some form of data structure. They could give me a list, they could give me a set, or they could implement their own iterable class. I want to make sure that this structure has order, and will return the same way regardless of when I decide to iterate. – K Engle Oct 28 '15 at 21:45
  • As an example, a set is not ordered. A list is ordered. A set should inherently fail my test cases since it is not ordered. A list should pass my test cases. – K Engle Oct 28 '15 at 21:47
  • OK, if they can give you any type of sequence, I believe it would be hard to order it in case it's a set or a dictionary. It would not make any sense to order them afterwards. The original order is lost forever. It would make sense to alert them of the fact that dicts and sets are not allowed. – Pynchia Oct 28 '15 at 21:47
  • 2
    @RobertB OrderedDict does pass the test because it implements `__reversed__` – K Engle Oct 28 '15 at 21:50
  • You know they're just going to end up calling `list` on their dicts or something like that, right? – user2357112 Oct 28 '15 at 22:10
  • @user2357112 That doesn't invalidate the question. Its an example of a use case. – K Engle Oct 28 '15 at 22:16
  • Your solution actually seems okay to me; if this is truly targeted at new users, they probably shouldn't be using data types that are so arcane that they're ordered but don't fit any of the heuristic criteria you listed. One interesting approach, though (assuming it's possible) would be to change the hash function for the elements of the container, then re-run the user's code to see if the order changes. – Kyle Strand Oct 29 '15 at 00:23
  • A `set` has order. If it's never mutated, it always iterates the same way during a given run of Python. The distinction between `iter(someset)` and `iter(somelist)` (or generator that effectively does the same thing) cannot be meaningfully handled. – ShadowRanger Oct 29 '15 at 00:45
  • @ShadowRanger https://docs.python.org/2/library/sets.html " Being an unordered collection, sets do not record element position or order of insertion. Accordingly, sets do not support indexing, slicing, or other sequence-like behavior." – K Engle Oct 29 '15 at 17:18
  • @KEngle: You're missing my point. If generators are "ordered", then everything can be ordered; the iterator over a `set` (or a generator based on iterating a `set`) is going to appear ordered by that definition, even if your heuristics successfully exclude the `set` itself. The point is, if your definition of "ordered" is "produces values one by one", then a `set` is "ordered", which as you note, is self-evidently ridiculous. – ShadowRanger Oct 29 '15 at 19:43
  • @ShadowRanger That was part of my hesitance with including Generator type. The main reason it was included was I was told .sort() can return a generator on a structure. I should probably test that claim. – K Engle Oct 29 '15 at 21:52
  • `list.sort()` mutates a `list` in place, `sorted` receives an arbitrary iterable and returns a sorted `list`. Python intentionally avoids providing _any_ other built-in sort (you could potentially make a generator that performs only `O(n)` up front work with `heapq.heapify` then repeatedly `heapq.heappop`s to roll your own `itersorted` for the rare cases where it's an improvement, usually only when you'll usually stop iterating after a small but not initially known number of elements are processed, otherwise a complete `sorted` wins due to heavy optimization). – ShadowRanger Oct 29 '15 at 22:15
  • But even so, if the result of such a sort is wrapped in `iter` (directly or indirectly) or used to produce a generator in some way, it's still just as ordered, but omitting Generators would mean you didn't treat it as such. This is not a generally solvable problem; it's always going to be heuristic. If it needs to be sorted, call `sorted` on it (TimSort is optimized to do very little work when the data is already mostly ordered). If the data should have logical order, fall back on "We're all adults here"; document the requirement, and if people violate it, so be it. – ShadowRanger Oct 29 '15 at 22:18
  • Related: https://stackoverflow.com/questions/12590494/in-python-how-do-i-determine-if-an-iterable-has-stable-iteration-order – augurar Feb 14 '16 at 03:09

2 Answers2

3

I think, it is very interesting guestion. There is no way in pure python (without utility code) to check if collection is ordered.

Let's go sequentially =)

To check Sequence or GeneratorType you can use collections.Iterable type.

>>>
>>> import collections
>>>
>>> result = [1,2,3,4,-1]
>>> isinstance(result, collections.Iterable)
True
>>>
>>> def generator_func(arg=10):
...     for i in xrange(arg):
...         yield i
...
>>>
>>> generator_func()
<generator object generator_func at 0x7f667c50f190>
>>>
>>> result = generator_func()
>>> isinstance(result, collections.Iterable)
True

But:

>>>
>>> result = {1,2,3,4,-1}
>>> isinstance(result, collections.Iterable)
True
>>>

It is bad case for you. Because:

>>> x = {1,2,3,-1}
>>> x
set([1, 2, 3, -1])
>>> [_ for _ in x]
[1, 2, 3, -1]
>>> x = {1,2,3,0}
>>> x
set([0, 1, 2, 3])
>>> [_ for _ in x]
[0, 1, 2, 3]
>>> import collections
>>> isinstance(x, collections.Iterable)
True
>>>

Of course for this case you should use a collections.Sequence only.

>>> result = {1,2,3,4,-1}
>>> isinstance(result, collections.Sequence)
False
>>> isinstance({1:2, 3:3}, collections.Sequence)
False
>>>

But:

>>> result = generator_func()
>>> isinstance(result, collections.Sequence)
False
>>> 

Thus, I think that an idea to check Sequence or GeneratorType is nice. Check this link:

So:

>>> result = generator_func()
>>> isinstance(result, (collections.Sequence, collections.Iterator))
True
>>> result = [1,2,3,4,5]
>>> isinstance(result, (collections.Sequence, collections.Iterator))
True
>>> result = (1,2,3,4,5)
>>> isinstance(result, (collections.Sequence, collections.Iterator))
True
>>> result = {1,2,3,4,5}
>>> isinstance(result, (collections.Sequence, collections.Iterator))
False
>>> result = {1:1,2:2,3:3,4:4,5:5}
>>> isinstance(result, (collections.Sequence, collections.Iterator))
False
>>> 

Аbout order.

If you are not sure about order of items, I think you should check them explicitly.

«Explicit is better than implicit.»

>>>
>>> def order_check(result, order_rule = cmp_rule):
...     for item, next_item in zip(result, result[1:]):
...         if not order_rule(item, next_item):
...             return False
...     return True
...
>>> def cmp_rule(item, next_item):
...     if item < next_item:
...         return True
...     return False
...
>>>
>>> result = [1,2,3,4,5]
>>> order_check(result)
True
>>> result = [1,2,3,4,5,-1]
>>> order_check(result)
False
>>>

But, honestly, generators guarantee that the order would be the same as you generate in it.

-2

You haven't entirely described what you mean by the property of having "order"; your "callable" is a good idea. I suggest that you read through the "dunder" (double underscore) methods to see whether one of them matches the definition you want. From what you've given us, I strongly suspect that __getitem__ is what you want. There is also the "ask forgiveness" method of setting a try-except block on the structure:

try:
    dummy = (_ for _ in result)
    has_order = True
except:
    has_order = False
return has_order
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
Prune
  • 76,765
  • 14
  • 60
  • 81
  • 1
    From the first paragraph, it's clear that OP does not merely mean "iterable", which is all your example code checks for. – Kyle Strand Oct 29 '15 at 00:17
  • I agree: this answer is quite insufficient. @Kyle Strand makes this a valuable point, so I'll leave the answer here and take the 2-point deduction. :-) – Prune May 04 '16 at 21:27