Most efficient way to cycle over Python sublists while making them grow (insert method)?

Question

My problem is about managing insert/append methods within loops.

I have two lists of length N: the first one (let's call it s) indicates a subset to which, while the second one represents a quantity x that I want to evaluate. For sake of simplicity, let's say that every subset presents T elements.

cont = 0;
for i in range(NSUBSETS):
    for j in range(T):
        subcont = 0;
        if (x[(i*T)+j] < 100):
            s.insert(((i+1)*T)+cont, s[(i*T)+j+cont]);
            x.insert(((i+1)*T)+cont, x[(i*T)+j+cont]);
            subcont += 1;
    cont +=  subcont;

While cycling over all the elements of the two lists, I'd like that, when a certain condition is fulfilled (e.g. x[i] < 100), a copy of that element is put at the end of the subset, and then going on with the loop till completing the analysis of all the original members of the subset. It would be important to maintain the "order", i.e. inserting the elements next to the last element of the subset it comes from.

I thought a way could have been to store within 2 counter variables the number of copies made within the subset and globally, respectively (see code): this way, I could shift the index of the element I was looking at according to that. I wonder whether there exists some simpler way to do that, maybe using some Python magic.

Are you coming from a C-type language background? The only time you should put semicolons at the end of a statement is when you have another statement on the same line. (Though you could go the other way and use `from __future__ import braces` to really get that C-style feel if you wanted to.) — JAB, May 29 '13 at 18:32
You're right, I should fall out of the habit... :-) I didn't know about multiple statements on the same line, thanks for the tip! — Orso, May 29 '13 at 21:11
You're welcome. (Though note that I was joking with `from __future__ import braces`; it doesn't actually work.) — JAB, May 30 '13 at 12:07
Ahah I didn't get it! :-D Anyway, I swear that I didn't tried to! — Orso, May 30 '13 at 14:14

score 2 · Answer 1 · edited May 23 '17 at 10:25

If the idea is to interpolate your extra copies into the lists without making a complete copy of the whole list, you can try this with a generator expression. As you loop through your lists, collect the matches you want to append. Yield each item as you process it, then yield each collected item too.

This is a simplified example with only one list, but hopefully it illustrates the idea. You would only get a copy if you do like i've done and expand the generator with a comprehension. If you just wanted to store or further analyze the processed list (eg, to write it to disk) you could never have it in memory at all.

def append_matches(input_list, start, end, predicate):
  # where predicate is a filter function or lambda
  for item in input_list[start:end]:
    yield item
  for item in filter(predicate, input_list[start:end]): 
    yield item

example = lambda p:  p < 100
data = [1,2,3,101,102,103,4,5,6,104,105,106]

print [k for k in append_matches (data, 0, 6, example)]
print [k for k in append_matches (data, 5, 11, example)]

[1, 2, 3, 101, 102, 103, 1, 2, 3]
[103, 4, 5, 6, 104, 105, 4, 5, 6]

Technically, generator expressions for those would be `(item for item in input_list[start:end])` and `(item for item in input_list[start:end] if predicate(item))`, you're using a full-fledged generator function. Also, for large subsets it may be more efficient to use `itertools.islice()` and `itertools.ifilter()`, as normal Python slicing and filter() copy the data into a new list rather than merely iterating over the original data in place. — JAB, May 30 '13 at 12:23
I avoided itertools to make it simpler since it looks like OP is new to Python. But you're right, it's not an expression its a function :) — theodox, May 30 '13 at 17:51

score 2 · Answer 2 · answered May 30 '13 at 09:28

I'm guessing that your desire not to copy the lists is based on your C background - an assumption that it would be more expensive that way. In Python lists are not actually lists, inserts have O(n) time as they are more like vectors and so those insert operations are each copying the list.

Building a new copy with the extra elements would be more efficient than trying to update in-place. If you really want to go that way you would need to write a LinkedList class that held prev/next references so that your Python code really was a copy of the C approach.

The most Pythonic approach would not try to do an in-place update, as it is simpler to express what you want using values rather than references:

def expand(origLs) :
  subsets = [ origLs[i*T:(i+1)*T] for i in range(NSUBSETS) ]
  result = []
  for s in subsets :
    copies = [ e for e in s if e<100 ]
    result += s + copies
  return result

The main thing to keep in mind is that the underlying cost model for an interpreted garbage-collected language is very different to C. Not all copy operations actually cause data movement, and there are no guarantees that trying to reuse the same memory will be successful or more efficient. The only real answer is to try both techniques on your real problem and profile the results.

score 0 · Answer 3 · answered May 29 '13 at 16:02

0

I'd be inclined to make a copy of your lists and then, while looping across the originals, as you come across a criteria to insert you insert into the copy at the place you need it to be at. You can then output the copied and updated lists.

answered May 29 '13 at 16:02

ydaetskcoR

53,225
8
158
177

1

Thanks for your answer, I see your point. Anyway, I hope somebody will suggest another way to do that without copying the lists. – Orso May 29 '13 at 18:27

score 0 · Accepted Answer · answered May 30 '13 at 05:26

0

I think to have found a simple solution. I cycle from the last subset backwards, putting the copies at the end of each subset. This way, I avoid encountering the "new" elements and get rid of counters and similia.

for i in range(NSUBSETS-1, -1, -1):
    for j in range(T-1, -1, -1):
        if (x[(i*T)+j] < 100):
            s.insert(((i+1)*T), s[(i*T)+j])
            x.insert(((i+1)*T), x[(i*T)+j])

answered May 30 '13 at 05:26

Orso

179
1
2
12

I found all your answers very interesting and I'm reading them carefully in order to improve. I decided to choose my own answer just because in my opinion is the simplest and general one, even if maybe not too much pythonic. – Orso May 31 '13 at 16:22

JAB · Answer 5 · 2013-05-29T19:09:46.580

One possibility would be using numpy's advanced indexing to provide the illusion of copying elements to the ends of the subsets by building a list of "copy" indices for the original list, and adding that to an index/slice list that represents each subset. Then you'd combine all the index/slice lists at the end, and use the final index list to access all your items (I believe there's support for doing so generator-style, too, which you may find useful as advanced indexing/slicing returns a copy rather than a view). Depending on how many elements meet the criteria to be copied, this should be decently efficient as each subset will have its indices as a slice object, reducing the number of indices needed to keep track of.

Most efficient way to cycle over Python sublists while making them grow (insert method)?

5 Answers5