0

I have a python list of lists which has around 10000000 rows and each row has 8 elements. I have noticed that iterating through this list and processing them is painfully slow. Somewhere in the program I also need to sort this list of lists using a specific key. I am using this in a System with 2gb RAM . What is the best way to do processing on such large lists??

EDIT

Lets assume data[] which has around 10000000 lists. I need to sort the data using each lists 0th element

So I am iterating through data as follows

for m in data:

and for sorting I am using

data=sorted(data, key=itemgetter(0))
Read Q
  • 1,405
  • 2
  • 14
  • 26
  • How are you iterating through your list? Also, what is the criterion by which you want to sort? – inspectorG4dget Jan 25 '13 at 05:57
  • Do you load this list into memory at once, or piece-by-piece? – Tadeck Jan 25 '13 at 05:59
  • related : http://stackoverflow.com/questions/1989251/alternatives-to-keeping-large-lists-in-memory-python – Ashwini Chaudhary Jan 25 '13 at 06:01
  • 1
    What are elements data types? What operations do you need to perform on them? Have you tried numpy arrays? To sort inplace, use `a.sort()` instead of `a = sorted(a)` – jfs Jan 25 '13 at 06:05
  • Some elements are string and some are float values. – Read Q Jan 25 '13 at 06:07
  • 1
    Have a look at [Blist](http://pypi.python.org/pypi/blist/), these are similar to lists but performs better than list as the size increases. http://stutzbachenterprises.com/performance-blist/sort-random-list – Ashwini Chaudhary Jan 25 '13 at 06:12
  • Honestly the "best" way may be to not have that monster list at all. Can your problem be reworked to be solved with chaining iterators? – cmd Jan 25 '13 at 16:30

1 Answers1

2

Without knowing what your sorting criterion is, I can't say much.

The most memory efficient way of iteration that I can think of is to use itertools.chain:

for element in itertools.chain.from_iterable(myLongList):
    print element

EDIT:

sorted creates a new list out of the old one. Use list.sort to do this in-place instead:

myLongList.sort(key=operator.itemgetter(0))

if you want to iterate over the elements in each row and still be able to access items on either side:

for rowInd, row in enumerate(myLongList):
    for colInd, element in enumerate(row):
        print "myLongList[%d][%d] is %s" %(rowInd, colInd, element)
Thorsten Kranz
  • 12,492
  • 2
  • 39
  • 56
inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
  • 1
    Yes, this is probably the best way to iterate through _items of inner lists_. This is important distinction, I think, as the OP may actually wish to have the list during the iteration also (although I hope he does not want that). – Tadeck Jan 25 '13 at 06:02
  • Actually, I need to access some previos and next values in the lists while processing a single element in the list. – Read Q Jan 25 '13 at 06:05