1

I want to create new list according cumulative sums of numbers in a list. Input is ideal - can be splitting to subset, sum of each subset is equal. Length of subset is not equal. Number of subset is input.

Each subset of output represents increment integers [0,1,2,3,...], which replace original input. Quantity of integers is number of subsets.

Example:

number of subsets = 2   

input = [1, 4, 5]
#cumsum = [1, 5, 10]
subsets = [1,5], [10]
output-subsets = [0,0], [1]
output = [0, 0, 1]

Example1:

number of subsets = 4

input = [1, 2, 3, 4, 2, 5, 1, 6]
#cumsum = [1, 3, 6, 10, 12, 17, 18, 24]
subsets = [1,3,6], [10, 12],[17, 18], [24]
output-subsets = [0, 0, 0], [1, 1], [2, 2], [3]
output = [0, 0, 0, 1, 1, 2, 2, 3]

number of subsets = 2

input = [1, 2, 3, 4, 2, 5, 1, 6]
#cumsum = [1, 3, 6, 10, 12, 17, 18, 24]
subsets = [1, 3, 6, 10, 12],[17, 18, 24]
output-subsets = [0, 0, 0, 0, 0], [1, 1, 1]
output = [0, 0, 0, 0, 0, 1, 1, 1]

I try modified SO question:

def changelist(lis, t):
    total = 0

    s = sum(lis)
    subset = s/t

    for x in lis:
        total += x
        i= 1
        if(total <= subset):
            i = 0
        yield i


#changelist([input array], number of subset)    
print list(changelist([1, 2, 3, 4, 2, 5, 1, 6], 4))     

but only first subset is correct:

output = [0, 0, 0, 1, 1, 1, 1, 1]

I think numpy.array_split is problematic strange behaviour of numpy array_split.

I would really love any kind of explanation or help.

Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I'm not sure what you're trying to do. Can you elaborate? – Cyphase Aug 10 '15 at 09:06
  • Your first block with sums is unclear. I have no idea why the lists shown on the right hand side of each line are associated with the sums. This statement also needs elaboration: "Each subset of output represents increment integers, which replace original input:." particularly, what are the "increment integers"? – Paul Aug 10 '15 at 09:09
  • increment integers - I think that can be incremented e.g.[0, 0, 0, 1, 1, 2, 2, 3, 4, 4, 4, 5, 5...] = 6 subsets if number of subset is higher – jezrael Aug 10 '15 at 09:11
  • Your question is very unclear, where does `numpy.array_split` fit into iit? – Padraic Cunningham Aug 10 '15 at 09:12
  • NumPy's array_split splits into subsets with equal length. – jezrael Aug 10 '15 at 09:16
  • I agree that the question is not very clear. If it's just about reproducing the output: `(numpy.cumsum(input)-1)/6` – normanius Aug 10 '15 at 09:16
  • I try to clear question by examples. – jezrael Aug 10 '15 at 09:31

2 Answers2

2

This should solve your problem:

def changelist (l, t):
  subset = sum(l) / t
  current, total = 0, 0
  for x in l:
    total += x
    if total > subset:
      current, total = current + 1, x
    yield current

Examples:

>>> list(changelist([1, 4, 5], 2))
[0, 0, 1]
>>> list(changelist([1, 2, 3, 4, 2, 5, 1, 6], 4))
[0, 0, 0, 1, 1, 2, 2, 3]
>>> list(changelist([1, 2, 3, 4, 2, 5, 1, 6], 2))
[0, 0, 0, 0, 0, 1, 1, 1]

How does it work?

  • current stores the "id" of the current subset, total the sum of the current subset.
  • For each element x in your initial list l, you add its value to the current total, if this total is greater than the expected sum of each subset (subset in my code), then you know that you are in the next subset (current = current + 1) and you "reset" the total of the current subset to the actuel element (total = x).
Holt
  • 36,600
  • 7
  • 92
  • 139
1

You can use NumPy here after converting the input to an array for a vectorized solution, assuming N as the number of subsets, as listed here -

def modified_cumsum(input,N):
    A = np.asarray(input).cumsum()
    return np.append(False,np.in1d(A,(1+np.arange(N))*A[-1]/N))[:-1].cumsum()

Sample runs -

In [31]: N = 2  #number of subsets
    ...: input = [1, 4, 5]
    ...: 

In [32]: modified_cumsum(input,N)
Out[32]: array([0, 0, 1])

In [33]: N = 4  #number of subsets
    ...: input = [1, 2, 3, 4, 2, 5, 1, 6]
    ...: 

In [34]: modified_cumsum(input,N)
Out[34]: array([0, 0, 0, 1, 1, 2, 2, 3])

In [35]: N = 2  #number of subsets
    ...: input = [1, 2, 3, 4, 2, 5, 1, 6]
    ...: 

In [36]: modified_cumsum(input,N)
Out[36]: array([0, 0, 0, 0, 0, 1, 1, 1])
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Is Numpy faster than pure python (Holt solution) for this kind of thing?, +1 – jezrael Aug 10 '15 at 10:20
  • @jezrael Conversion to a numpy array seems to be the bottleneck here. Other than that, it should be quite efficient. – Divakar Aug 10 '15 at 10:21