5

I have a list for example

l = [10, 20, 30, 40, 50, 60]

I need to increment the first n elements of the list given a condition. The condition is independent of the list. For example if n = 3, the list l should become :

l = [11, 21, 31, 40, 50, 60]

I understand that I can do it with a for loop on each element of the list. But I need to do such operation around 150 million times. So, I am looking for a faster method to do this. Any help is highly appreciated. Thanks in advance

Dreams
  • 5,854
  • 9
  • 48
  • 71
  • 2
    Maybe something like `l[:n] = [x + 1 for x in l[:n]]`. It's probably as "slow" as regular loop. – vaultah Dec 08 '15 at 17:33
  • Thanks! since I need to do this operation 150 million times, i wanted a faster method if possible. I am looking for something which can optimise this if possible. – Dreams Dec 08 '15 at 17:37
  • shorter isn't always faster. – Ahsanul Haque Dec 08 '15 at 17:38
  • 1
    Are you using numpy? – shuttle87 Dec 08 '15 at 17:39
  • no, but i can use numpy if there is a faster way to do this. – Dreams Dec 08 '15 at 17:39
  • in numpy it would simply look like: `my_list[test]++` ...lower case L is a terrible name for a variable (where `test` is a bool array of length `n`, even if that's shorter than `len(my_list)`) – dan-man Dec 08 '15 at 17:40
  • okay.. thanks. will try this. – Dreams Dec 08 '15 at 17:43
  • 1
    @dan-man: It would look like `my_list[:n] += 1`, There's no `++` in Python, and building and indexing with a bool array for this would be unnecessary overhead. – user2357112 Dec 08 '15 at 17:44
  • @VincentSavard - am writing a long program where i need to do this. Since am doing it in python, i need to do this operation in python. – Dreams Dec 08 '15 at 17:45
  • Consider benchmarking with numpy and also try running on pypy. If you are still finding this to be too slow you might need to write an extension in C. – shuttle87 Dec 08 '15 at 17:46
  • @user2357112 - the question is ambiguous is there a per-element condition up to index n or *is* the up-to-index-n the only "condition"...I use numpy everyday and I forget that you cant do `++`! – dan-man Dec 08 '15 at 17:47
  • @shuttle87 - will try the other methods, if not, probably i will have to write an extension. Thanks for the suggestions. – Dreams Dec 08 '15 at 17:55
  • Please see my answer below, it shows how to construct a simple range based data structure to do what you want. – paul-g Dec 08 '15 at 17:57
  • 1
    When you say "*I need to do such operation around 150 million times.*" - do you have a list of 150 million items to increment, or do you have a much shorter list which you need to partially increment, 150 million times, or 150 short lists you need to increment a million times each? Is it the same range which gets incremented each time, in certain conditions, or a different range? – TessellatingHeckler Dec 08 '15 at 18:36
  • its a different range given a condition. There can be 7 different ranges i.e. the list is of 7 elements. So, i might increment 1st element, or 1st and 2nd element or so on. There are 160000 such lists. I am basically using the same list initialised to [0]*7 each time. And the number of increments depend upon the conditions – Dreams Dec 08 '15 at 19:25

4 Answers4

3

Here's an operation-aggregating implementation in NumPy:

initial_array = # whatever your l is, but as a NumPy array
increments = numpy.zeros_like(initial_array)
...
# every time you want to increment the first n elements
if n:
    increments[n-1] += 1
...
# to apply the increments
initial_array += increments[::-1].cumsum()[::-1]

This is O(ops + len(initial_array)), where ops is the number of increment operations. Unless you're only doing a small number of increments over a very small portion of the list, this should be much faster. Unlike the naive implementation, it doesn't let you retrieve element values until the increments are applied; if you need to do that, you might need a solution based on a BST or BST-like structure to track increments.

user2357112
  • 260,549
  • 28
  • 431
  • 505
2

You can create a simple data structure on top of your list which stores the start and end range of each increment operation. The start would be 0 in your case so you can just store the end.

This way you don't have to actually traverse the list to increment the elements, but you only retain that you performed increments on ranges for example {0 to 2} and {0 to 3}. Furthermore, you can also collate some operations, so that if multiple operations increment until the same index, you only need to store one entry.

The worst case complexity of this solution is O(q + g x qlogq + n) where g is the number of get operations, q is the number of updates and n is the length of the list. Since we can have at most n distinct endings for the intervals this reduces to O(q + nlogn + n) = O(q + nlogn). A naive solution using an update for each query would be O(q * l) where l (the length of a query) could be up to the size of n giving O(q * n). So we can expect this solution to be better when q > log n.

Working python example below:

def RangeStructure(object):

  def __init__(self, l):
    self.ranges = collections.defaultdict(int)
    self.l = l

  def incToPosition(self, k):
    self.ranges[k] += 1

  def get(self):
    res = self.l
    sorted_keys = sorted(self.ranges)
    last = len(sorted_keys) - 1                                                                                                                                                                                                                
    to_add = 0
    while last >= 0:
        start = 0 if last < 1 else sorted_keys[last - 1]
        end = sorted_keys[last]
        to_add += self.ranges[end]
        for i in range(start, end):
            res[i] += to_add
        last -= 1
    return res

rs = RangeStructure([10, 20, 30, 40, 50, 60])
rs.incToPosition(2)
rs.incToPosition(2)
rs.incToPosition(3)
rs.incToPosition(4)
print rs.get()

And an explanation:

  1. after the inc operations ranges will contain (start, end, inc) tuples of the form (0, 2, 2), (0, 3, 1), (0, 4, 1); these will be represented in the dict as { 2:2, 3:1, 4:1} since the start is always 1 and can be omitted

  2. during the get operation, we ensure that we only operate on any list element once; we sort the ranges in increasing order of their end point, and traverse them in reverse order updating the contained list elements and the sum (to_add) to be added to subsequent ranges

This prints, as expected:

[14, 24, 32, 41, 50, 60]
paul-g
  • 3,797
  • 2
  • 21
  • 36
  • 1
    For much better efficiency, you'd probably want to go over the increment positions in reverse and keep track of how much you need to add to each element, rather than incrementing each `res` entry once for each range endpoint past it. Also, you don't need to call `keys`. – user2357112 Dec 08 '15 at 17:56
  • The way you're doing it now, it's only any faster if you call `incToPosition` multiple times with the same argument. – user2357112 Dec 08 '15 at 17:58
2

m - queries count, n - list to increment length, O(n + m) algorithm idea:
since you only have to increment from start to some k-th element you will get ranges of increments. Let our increment be pair (up to position, increment by). Example:
(1, 2) - increment positions 0 and 1 by 2
If we are trying to calculate value at position k then we should add increments that have positions greater or equal than k to current value at position k. How we can quickly calculate sum of increments that have positions greater or equal than k? We can start calculating values from the back of the list and then remember sum of increments.
Proof of concept:

# list to increment
a = [1, 2, 5, 1, 6]
# (up to and including k-th index, increment by value)
queries = [(1, 2), (0, 10), (3, 11), (4, 3)]

# decribed algorithm implementation
increments = [0]*len(a)
for position, inc in queries:
    increments[position] += inc

got = list(a)
increments_sum = 0
for i in xrange(len(increments) -1, -1, -1):
    increments_sum += increments[i]
    got[i] += increments_sum


# verify that solution is correct using slow but correct algorithm
expected = list(a)
for position, inc in queries:
    for i in xrange(position + 1):
        expected[i] += inc

print 'Expected: ', expected
print 'Got:      ', got

output:

Expected:  [27, 18, 19, 15, 9]
Got:       [27, 18, 19, 15, 9]
Žilvinas Rudžionis
  • 1,954
  • 20
  • 28
-2

You can use list comprehension and add the remaining list

[x + 1 for x in a[:n]]+a[n:]
Tarun Behal
  • 908
  • 6
  • 11