0

This part of my code does not scale if dimension gets bigger.

I loop over my data and accumulate them every dt time window. To do this I compare lower and upper time value. When I reach upper bound, I break the for loop for efficiency. The next time I run for loop I want to start not from its beginning but from the element I stopped previously, for efficiency. How can I do that?

I tried to remove/pop elements of the list but indexes get messed up. I read that I cannot modify the list I loop over, but my goal seems to be not uncommon so there has to be solution. I don't care about original data list later in my code, I only want optimization of my accumulation.

# Here I generate data for you to show my problem
from random import randint
import numpy as np

dimension = 200
times = [randint(0, 1000) for p in range(0, dimension)]
times.sort()
values = [randint(0, dimension) for p in range(0, dimension)]
data = [(values[k], times[k]) for k in range(dimension)]
dt = 50.0
t = min(times)
pixels = []
timestamps = []

# this is my problem
while (t <= max(times)):
    accumulator = np.zeros(dimension)
    for idx, content in enumerate(data):
        # comparing lower bound of the 'time' window
        if content[1] >= t:
            # comparing upper bound of the 'time' window
            if (content[1] < t + dt):
                accumulator[content[0]] += 1
                # if I pop the first element from the list after accumulating, indexes are screwed when looping further
                # data.pop(0)
            else:
                # all further entries are bigger because they are sorted
                break

    pixels.append(accumulator)
    timestamps.append(t)
    t += dt
Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
beginh
  • 1,133
  • 3
  • 26
  • 36
  • If you break the for loop into its own function, you can pass the starting index of the loop as a parameter(use [range()](https://docs.python.org/2/library/functions.html#range) in the loop instead). Then when you start the loop again you can call it from the index you finished on. The first time you call the function pass zero. The parameter will also be the first parameter that you call range with. – Albert Rothman Sep 29 '16 at 16:32
  • 4
    If you want to remove elements you can loop backwards or create a copy of the list or use a list comprehension. See [this](http://stackoverflow.com/questions/1207406/remove-items-from-a-list-while-iterating-in-python) – Albert Rothman Sep 29 '16 at 16:35
  • thank you! i need more python practice, because range() did not come to my mind. I was trying to find counterpart of c++ iterators. – beginh Sep 29 '16 at 18:43
  • yeah very useful to know that one! Since you are new to python once again encourage you to look at list comprehensions; very useful for quickly modifying list with succinct readable syntax – Albert Rothman Sep 29 '16 at 20:22

1 Answers1

0

In a simpler form, I think you are trying to do:

In [158]: times=[0, 4, 6, 10]
In [159]: data=np.arange(12)
In [160]: cnt=[0 for _ in times]
In [161]: for i in range(len(times)-1):
     ...:     for d in data:
     ...:         if d>=times[i] and d<times[i+1]:
     ...:             cnt[i]+=1
     ...:             
In [162]: cnt
Out[162]: [4, 2, 4, 0]

And you are trying to make this data loop more efficient by breaking form the loop when d gets too large, and by starting the next loop after items which have already been counted.

Adding the break is easy as you've done:

In [163]: cnt=[0 for _ in times]
In [164]: for i in range(len(times)-1):
     ...:     for d in data:
     ...:         if d>=times[i]:
     ...:             if d<times[i+1]:
     ...:                 cnt[i]+=1
     ...:             else:
     ...:                 break

In [165]: cnt
Out[165]: [4, 2, 4, 0]

One way to skip the counted stuff is to replace the for d in data with a index loop; and keep track of where we stopped last time around:

In [166]: cnt=[0 for _ in times]
In [167]: start=0
     ...: for i in range(len(times)-1):
     ...:     for j in range(start,len(data)):
     ...:         d = data[j]
     ...:         if d>=times[i]:
     ...:             if d<times[i+1]:
     ...:                 cnt[i]+=1
     ...:             else:
     ...:                 start = j
     ...:                 break
     ...:                 
In [168]: cnt
Out[168]: [4, 2, 4, 0]

A pop based version requires that I work with a list (my data is an array), a requires inserting the value back at the break

In [186]: datal=data.tolist()
In [187]: cnt=[0 for _ in times]
In [188]: for i in range(len(times)-1):
     ...:     while True:
     ...:         d = datal.pop(0)
     ...:         if d>=times[i]:
     ...:             if d<times[i+1]:
     ...:                 cnt[i]+=1
     ...:             else:
     ...:                 datal.insert(0,d)
     ...:                 break
     ...:             
In [189]: cnt
Out[189]: [4, 2, 4, 0]
In [190]: datal
Out[190]: [10, 11]

This isn't perfect, since I still have items on the list at the end (my times don't cover the whole data range). But it tests the idea.

Here's something closer to your attempt:

In [203]: for i in range(len(times)-1):
     ...:     for d in datal[:]:
     ...:         if d>=times[i]:
     ...:             if d<times[i+1]:
     ...:                 cnt[i]+=1
     ...:                 datal.pop(0)
     ...:             else:
     ...:                 break
     ...:       

The key difference is that I iterate on a copy of datal. That way the pop affects datal, but doesn't affect the current iteration. Admittedly there's a cost to the copy, so the speed up might be significant.

A different approach would be to loop on data, and step time as the t and t+dt boundaries are crossed.

In [222]: times=[0, 4, 6, 10,100]
In [223]: cnt=[0 for _ in times]; i=0
In [224]: for d in data:
     ...:     if d>=times[i]:
     ...:         if d<times[i+1]:
     ...:             cnt[i]+=1
     ...:         else:
     ...:             i += 1
     ...:             cnt[i]+=1
     ...:             
In [225]: cnt
Out[225]: [4, 2, 4, 2, 0]
hpaulj
  • 221,503
  • 14
  • 230
  • 353