I am not familiar with the pd
object's identity that you have used, but the way I understand your question is you have a list of labels (denoted id
in your code) that correspond to several lists of equal length (denoted var1
, var2
, and var3
in your code), and that you want to sum the items sharing the same label, doing this for each label, and return the result.
The following code solves the general problem (assuming your array of labels is sorted):
def cumsum(A):
from operator import add
return reduce(add, A) # cumulative sum of array A
def cumsumlbl(A, lbl):
idx = [lbl.index(item) for item in set(lbl)] # begin index of each lbl subsequence
idx.append(len(lbl)) # last index doesn't get added in the above line
return [cumsum(A[i:j]) for (i,j) in zip(idx[:-1], idx[1:])]
Or to use a modified version of Markus Jarderot's code that appears here:
def cumsum(A):
from operator import add
return reduce(add, A)
def doublet(iterable):
iterator = iter(iterable)
item = iterator.next()
for next in iterator:
yield (item,next)
item = next
def cumsumlbl(A, lbl):
idx = [lbl.index(item) for item in set(lbl)]
idx.append(len(lbl))
dbl = doublet(idx) # generator for successive, overlapping pairs of indices
return [cumsum(A[i:j]) for (i,j) in dbl]
And to test:
if __name__ == '__main__'
A = [1, 2, 3, 4, 5, 6]
lbl = [1, 1, 2, 2, 2, 3]
print cumsumlbl(A, lbl)
Output:
[3, 12, 6]