2

I have a large list, an excerpt of which looks like:

power = [
    ['1234-43211', [5, 6, -4, 11, 22]], 
    ['1234-783411', [43, -5, 0, 0, -1]], 
    ['1234-537611', [3, 0, -5, -6, 0]], 
    ['1567-345411', [4, 6, 8, 3, 3]], 
    ['1567-998711', [1, 2, 1, -4, 5]]
]

The first number in the string is the important one, and the one in which I hope to separate my additions. i.e. I only want to add cumulatively the values within each station (and return each singular cumulative addition), never add the values from two different ones.

My goal is to iterate over this list and add cumulatively the int values for a station, return each addition, then start again when the next station is detected in the list.

Desired result:

new = [
    [48, 1, -4, 11, -21], 
    [ 51, 1, -9, 5, -21], '### End of '1234' ### '
    [5,  8, 9, -1, 8], '### End of 1567 ###'
] or something similar to this

I have tried the following:

for i in range(len(power)-1):
    front_num_1 = power[i][0].split('-')[0]
    front_num_2 = power[i+1][0].split('-')[0]
    station = '%s' % (front_num_1)
    j = power[i][1]
    k = power[i+1][1]

    if front_num_1 == front_num_2:
        print [k + j for k, j in zip(j, k)]

    elif front_num_1 != front_num_2:
        print  '#####################################

    else:
        print 'END'

However this addition is not cumulative hence no use.

user1532369
  • 179
  • 1
  • 1
  • 10
  • please use pprint.pprint, or manually format your code, in the future. Also please add the `python` tag in the future. Thank you! – ninjagecko Aug 09 '12 at 08:30
  • 2
    In my humble opinion, it is unclear what you are trying to do based on "desired result". *edit*: Ah I see, you wish to split the list then do a cumulative sum. – ninjagecko Aug 09 '12 at 08:33
  • 1
    I don't understand your goal, too. And I'm unable to deduce how `new` shall be generated from `power`. Please describe in more detail *what* you want to acomplish. –  Aug 09 '12 at 08:35
  • @user1532369 Will there always be at least two lists with the same station? – jamylak Aug 09 '12 at 08:45
  • @jamylak yes there will always be more than two. – user1532369 Aug 09 '12 at 09:06
  • @ninjagecko the list doesn't necessarily need to be split, just have a space or somthing to distinguish where one station ends and another starts – user1532369 Aug 09 '12 at 09:07
  • 1
    @user1532369 I see you've got three answers telling you to use `groupby`. However, they all take for granted that items belonging to the same station are clustered together in the `power` list. If this is not always the case, their solutions break. Fix it by first sorting `power` with the same key as groupby uses. – Lauritz V. Thaulow Aug 09 '12 at 09:19
  • 1
    @user1532369: sentinel/dummy values, as you suggest, are considered poor programming practice because they make modularity especially difficult. In fact, the main question you posed stems from insufficient modularity, thus I would caution against such practices. – ninjagecko Aug 09 '12 at 09:40

2 Answers2

2
from itertools import groupby, islice

def accumulate(iterable): # in py 3 use itertools.accumulate
    ''' Simplified version of accumulate from python 3'''
    it = iter(iterable)
    total = next(it)
    yield total
    for element in it:
        total += element
        yield total

power = [
    ['1234-4321-1', [5, 6, -4, 11, 22]],
    ['1234-7834-1', [43, -5, 0, 0, -1]],
    ['1234-5376-1', [3, 0, -5, -6, 0]],
    ['1567-3454-1', [4, 6, 8, 3, 3]],
    ['1567-9987-1-', [1, 2, 1, -4, 5]]
]

groups = ((k, (nums for station, nums in g))
          for k, g in
          groupby(power, lambda x: x[0].partition('-')[0]))

new = [(station, zip(*(islice(accumulate(col), 1, None) for col in zip(*nums))))
        for station, nums in groups]

print new    

print dict(new) # or as a dictionary which is unordered

Output

[('1234', [(48, 1, -4, 11, 21), (51, 1, -9, 5, 21)]), ('1567', [(5, 8, 9, -1, 8)])]
{'1234': [(48, 1, -4, 11, 21), (51, 1, -9, 5, 21)], '1567': [(5, 8, 9, -1, 8)]}

How this works:

First the lists are grouped based on the station using itertools.groupby.

Eg.

nums = [[5, 6, -4, 11, 22], 
        [43, -5, 0, 0, -1], 
        [3, 0, -5, -6, 0]]

is the first group. As you can see it is in the form of a matrix.

zip(*nums) transposes a matrix using argument unpacking. It calls

zip([5, 6, -4, 11, 22], [43, -5, 0, 0, -1], [3, 0, -5, -6, 0])

which creates the list:

cols = [(5, 43, 3), (6, -5, 0), (-4, 0, -5), (11, 0, -6), (22, -1, 0)]

then accumulate is called on each column, here's what that would look like:

>>> [list(accumulate(col)) for col in cols]
[[5, 48, 51], [6, 1, 1], [-4, -4, -9], [11, 11, 5], [22, 21, 21]]

As you can see the first element in each list here is not required so islice is used to take the elements from index 1 until then end(None). Here's what that looks like:

>>> [list(islice(accumulate(col), 1, None)) for col in cols]
[[48, 51], [1, 1], [-4, -9], [11, 5], [21, 21]]

Now we just need to transpose this back.

>>> zip(*(islice(accumulate(col), 1, None) for col in cols))
[(48, 1, -4, 11, 21), (51, 1, -9, 5, 21)]
Community
  • 1
  • 1
jamylak
  • 128,818
  • 30
  • 231
  • 230
  • this output is perfect except I need to have a marker or even the station number inserted so I know what cumulative additions belong to what stations e.g. '1234' at the start of your output list (or 2nd from the end) and '1567' 2nd last elenment in the list (or the last), as an idicator. I put these in as '#### end of station number 1234 ###' in my desired result but I realize noe this may have been interperted as a comment, my bad. – user1532369 Aug 09 '12 at 09:25
  • @user1532369 I gave two structures there, dict and list with the stations as keys. – jamylak Aug 09 '12 at 09:44
  • @user1532369 no problem :) I added an explanation. – jamylak Aug 09 '12 at 11:17
  • woahhhh that is comprehensive. I cannot thank you enough for taking the time, this is a quality answer and explanation! One last thing, if I wanted to leave out the last summation of each station, what would I put in instead of 'None'?? – user1532369 Aug 09 '12 at 13:54
  • @user1532369 in that case I would just convert it into a list and not use `islice` so `list(accumulate(col))[1:-1]`. I assume that's what you mean by instead of `None`. – jamylak Aug 09 '12 at 13:58
  • as in I just discovered that the very last addition in each station e.g. (51, 1, -9, 5, 21) for '1234' is utter nonsense and can be discarded, and instead I wish to return the first set of flows in each station as the first sum i.e. the final result for '1234' needs to look like: ([5, 6, -4, 11, 22], (48, 1, -4, 11, 21)] --- so I figured if I changed elements index 1 to 0 that will produce the first set of ints from each station in the new list, but I am wondering how to cut off the last sumamtion in each? – user1532369 Aug 09 '12 at 14:03
  • @user1532369 `list(accumulate(col))[:-1]` ? Not fully sure what you mean but that produces the correct result for '1234' – jamylak Aug 09 '12 at 14:12
  • @jamklak any idea how I would export this final list to an excel file, would I have to alter it first? – user1532369 Aug 10 '12 at 08:20
0

It would help if you broke down your problem into smaller pieces. I seem to understand that you want to 1) split your list based on some criterion, then 2) take the cumulative sum of each sublist (considering each element a vector).

For example:

stationList = [
 ['1234-4321-1', [5, 6, -4, 11, 22]], 
 ['1234-7834-1', [43, -5, 0, 0, -1]], 
 ['1234-5376-1', [3, 0, -5, -6, 0]], 
 ['1567-3454-1', [4, 6, 8, 3, 3]], 
 ['1567-9987-1-', [1, 2, 1, -4, 5]]
]

Becomes:

{'1234-4321-1': [
    <5, 6, -4, 11, 22>, 
    <5, 6, -4, 11, 22> + <43, -5, 0, 0, -1>,
    <5, 6, -4, 11, 22> + <43, -5, 0, 0, -1> + <3, 0, -5, -6, 0>
 ], 
 '1567-3454-1': [
    <4, 6, 8, 3, 3>, 
    <4, 6, 8, 3, 3> + <1, 2, 1, -4, 5>
 ]
}

(where I use <...> to denote a hypothetical Vector object, or merely treating the list as a vector.)


Solution

from itertools import *

1) To split a list based on some criterion, use itertools.groupby: documentation here. Or write a generator function.

getStation = lambda x: x[0].split('-')[0]
def groupby_station(inputList):
    return groupby(inputList, key=getStation)

2) A cumulative sum can be written as a generator function. You can use numpy, or just write it yourself.

def listAdd(*lists):
    """
        listAdd([1,2,3], [10,20,30]) -> [11,22,33]
        listAdd([1,2,3], []) -> [1,2,3]
    """
    return [sum(xs) for xs in zip_longest(*lists, fillvalue=0)]

def cumSum(lists):
    """
        cumSum([1,2],[10,20],[100,200]) -> ([1,2],[11,22],[111,222])
    """
    total = []
    for list in lists:
        total = listAdd(total, list)
        yield total

Now just combine the two:

{key:cumSum(*lists) for key,lists in groupby_station(inputList)}

Note that my definition of cumulative sum is slightly different from yours; you can modify the cumSum function to match your definition.

ninjagecko
  • 88,546
  • 24
  • 137
  • 145