2

I'm using itertools.chain method in Python to chain several Django Querysets together. By doing so, I'm not touching the database and this is the efficient behaviour I need. However, I'm using a third-party library to paginate these results and this library only accepts list and queryset objects. When calling it with the chain object I get the following error:

Exception Value: 'itertools.chain' object has no attribute '__getitem__'

The line in the library (django-pagemore) that is actually diving me crazy is:

objects = self.objects[page0*self.per_page:1+page*self.per_page]

The problem here is that when using a chain you can't slice it.

I know that I could convert the chain object into a list easily with list() method, but this would evaluate the ENTIRE queryset and this can contain thousands of items inside.

After some research on how to calculate the size of a Python object I did some testing and using sys.getsizeof(cPickle.dumps(content)) (where content is one of the objects inside the chain) gives me a value of 15,915 bytes, so a chain containing 3,000 of these objects would need 45.53 MB aprox!

Community
  • 1
  • 1
Caumons
  • 9,341
  • 14
  • 68
  • 82

1 Answers1

6

itertools.chain() returns a iterable, not a sequence. You cannot index or slice an iterable.

Use itertools.islice() to define a subset; when looping over the islice() result, the underlying iterable will be advanced to the starting index, then will yield items until the end index:

objects = islice(self.objects, page0 * self.per_page, 1 + page * self.per_page)

This iterates over the chained sequence, so you cannot then access the items before the start index.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    Note that this also consumes part or all of the iterable as well. – Ignacio Vazquez-Abrams Aug 14 '13 at 13:00
  • @Caumons: Then you are slicing a chain that doesn't return enough objects, I suspect. It works just fine for me. – Martijn Pieters Aug 14 '13 at 13:21
  • @MartijnPieters OK, the problem was that I was iterating over the chain object before calling `islice()` to it, so the pointer was at the end of the iterable. However, I'm wondering if modifying the library will have side effects because `isslice()` will actually shorten the chain each time and when a chain is iterated, then trying to iterate it again will do nothing. I've tried to do a `deepcopy()` of the chain but it doesn't seem to work. – Caumons Aug 14 '13 at 13:40
  • @Caumons: That is the nature of an iterator. You **cannot** rewind them or iterate more than once. If you need random access to the items, then you are *forced* to use a list. That, or *recreate* the iterator (call `chain()` on the queries again). – Martijn Pieters Aug 14 '13 at 13:44
  • @Caumons: TLDR: you cannot iterate over an iterator more than once. – Martijn Pieters Aug 14 '13 at 13:44
  • So, if I'm forced to use a list... Which is a better way to calculate the memory used than the method I used? And another thing: working with a 50 MB list would cause problems to a webserver, I mean, is it a terribly huge object in memory that MUST be avoided? – Caumons Aug 14 '13 at 13:47
  • 1
    You are trading memory vs database bandwidth here, btw. The chain is based on a database query, so recreating the chain for a second run-through results in a new series of database queries plus data transfer from database to web server process. loading everything into memory *may* be faster, and memory is cheap. – Martijn Pieters Aug 14 '13 at 13:53
  • You'll need to recursively call `sys.getsizeof()` on the content object and it's attributes to calculate the memory size. Calling `sys.getsizeof()` on a pickle string only tells you how much memory that *string* takes, and a pickle is not necessarily a good way of representing how much memory the original object requires because a pickle needs to be importable cross-platform. – Martijn Pieters Aug 14 '13 at 13:55
  • A Python integer on a 64-bit machine takes more memory than on a 32-bit machine, but the pickle for that integer is *the same size* on either. And if pickle *were* a reasonable representation of a memory footprint, then just using the `len()` of the string would be much better to measure the size.. – Martijn Pieters Aug 14 '13 at 13:56
  • Sounds like you could benefit from using a sparse list filled only with the objects of the page, fooling the third-party library into retrieving from an almost-empty list. One example is the blist [http://stutzbachenterprises.com/blist/blist.html]. – augustomen Aug 14 '13 at 18:41
  • @MartijnPieters thanks for your comments! I'll take into account what you told me about measuring an object's memory size. (Maybe you want to add a better response to the linked question). Finally, to solve the problem I simply limited the query and used `list()` in conjuction with `chain()`, as it's the simplest approach I can think of. It may be considered quick & dirty, but at least it will work (I hope). I've accepted your answer because you explained me how to slice a chain! Thanks :) – Caumons Aug 16 '13 at 20:10