2

I need some logic or idea for Achieving the below use case:

Base Input List:

data=[
["10000025",710],
["1000138833",1065],
["100274005",820],
["1353180",3160],
["481584",3670],
["4851845845",1690],
["485584",1310],
["48848448",1000],
["49849948",1050],
["585598",4620],
["84154858584",620],
["841584584",2050],
["8451184584",2860],
["845188956",1800],
["845845184",1300],
["8458484",2300],
["8954884",1780],
["9416481584",2720],
["9448155584",1000],
["94949494",1000],
["959494158",1590],
["98558858",1550]
]

Expected Output: List of Combos from the list having the sum of 10000, and if no sum of a specific value is possible, then a list of leftover elements with a max of 6 elements at each combo. enter image description here

I tried following some threads:

Find all combinations of a list of numbers with a given sum

How to get all combination for a given sum with a given number of elements

My code:

data = [
    ["10000025", 710], ["1000138833", 1065], ["100274005", 820], ["1353180", 3160], ["481584", 3670],
    ["4851845845", 1690], ["485584", 1310], ["48848448", 1000],
    ["49849948", 1050], ["585598", 4620], ["84154858584", 620], ["841584584", 2050], ["8451184584", 2860],
    ["845188956", 1800], ["845845184", 1300], ["8458484", 2300], ["8954884", 1780], ["9416481584", 2720],
    ["9448155584", 1000],
    ["94949494", 1000],
    ["959494158", 1590], ["98558858", 1550]
]
valueList = [x[1] for x in data]


def GetNumbers(number):
    result = []
    for i in sorted(valueList, reverse=True):
        sum_list = sum(result)
        if sum_list + i == number:
            result.append(i)
            return result
        elif sum_list + i < number:
            result.append(i)
    return result


for i in range(0, len(data)):
    print(GetNumbers(10000))

Output in code could be in the list of the list or dict, whichever is achievable.

pb36
  • 380
  • 1
  • 5
  • 11

3 Answers3

1

Here is a code, that solve this problem however there is a high complexity, so it will quickly become too long to compute if you have more element in your dataset.

data=[
["10000025",710],
["1000138833",1065],
["100274005",820],
["1353180",3160],
["481584",3670],
["4851845845",1690],
["485584",1310],
["48848448",1000],
["49849948",1050],
["585598",4620],
["84154858584",620],
["841584584",2050],
["8451184584",2860],
["845188956",1800],
["845845184",1300],
["8458484",2300],
["8954884",1780],
["9416481584",2720],
["9448155584",1000],
["94949494",1000],
["959494158",1590],
["98558858",1550]
]


targetAmmount = 10000
groupMaxSize = 6


def createGroup(current, remaining_data):
    currentAmmount = sum(e[1] for e in current)
    missingAmmount = targetAmmount - currentAmmount

    if len(current) >= groupMaxSize:
        if currentAmmount == targetAmmount:
            yield current
        else:
            yield None
    

    possibilities = sorted(filter(lambda e: e[1] <= missingAmmount, remaining_data), key=lambda e: e[1], reverse=True)
    for possibility in possibilities:
        next_current = [*current, possibility]
        next_remaining = list(filter(lambda e: e!=possibility, possibilities))


        if sum(e[1] for e in next_current) == targetAmmount:
            yield next_current
        else:
            for res in createGroup(next_current, next_remaining):
                yield res



res = []
ungrouped = []

while len(data) != 0:
    first = [data.pop(0)]

    groupeCreated = False
    for group in createGroup(first, data):
        if group != None:
            groupeCreated = True
            res.append(group)
            for elem in group:
                if elem in data:
                    data.remove(elem)
            break

    if not groupeCreated:
        ungrouped.append(first[0])


print("Res : \n", res)
print("Ungrouped : \n", ungrouped)
Xiidref
  • 1,456
  • 8
  • 20
1

The below code works for me. In jupyter lab on a 10 year old laptop it executes in 0.4 seconds.

from itertools import combinations, chain, islice

data=[["10000025",710],["1000138833",1065],["100274005",820],["1353180",3160],["481584",3670],["4851845845",1690],["485584",1310],["48848448",1000],["49849948",1050],["585598",4620],["84154858584",620],["841584584",2050],["8451184584",2860],["845188956",1800],["845845184",1300],["8458484",2300],["8954884",1780],["9416481584",2720],["9448155584",1000],["94949494",1000],["959494158",1590],["98558858",1550]]
limit = 10000

def data_sum(xs):
    global limit # using global to avoid having to do multiple argument passing in filter
    return (sum(x[1] for x in xs)) == limit

def data_uptil(xs):
    global limit
    return (sum(x[1] for x in xs)) < limit

match_limit = chain.from_iterable([(filter(data_sum,combinations(data,k))) for k in range(2,len(data)+1)])
uptil_limit = (filter(data_uptil,combinations(data,6)))

print("Exact matchest that sum to limit")
for m in match_limit:
    print(list(m))

print("Samples with 6 elements that sum to less than limit")
for u in uptil_limit:
    print(list(u))

A few notes:

  • If you run with pypy it might be faster, although the itertools are pretty fast on their own.

  • You said all possible combinations, so I included combinations of size 2 up to the size of the input list

  • I believe your code will not produce correct results as it is summing up from a sorted list as it goes. This would not work if limit = 10 and the list is (2,4,7,8) because it would not find (2,8).

  • match_limit and uptil_limit generators are created almost instantaneous using len consumes the list and will take processing time, for example print(len(list(match_limit))) which gives 278 possible combinations in 4.2 seconds on my 10 year old laptop

  • you can use islice if you want just a few answers, for example for m in islice(match_limit,5):

  • your choice if you want a dict or a list. this also works: print(dict(m))

  • as noted, it is the calling of list that will consume the generator this means results start be produced immediately as they are found.

dsagman
  • 481
  • 1
  • 8
  • If I am not wrong, this is giving an output of only a single possible combination in the list of list, and if you see the image in question, it needs to have a unique combination whose sum is 10000 and leftover whose sum can't be 10000 could be divided in chunks whose sum would be less then 10000. – pb36 Aug 29 '22 at 14:17
  • The code checks every possible combination of k elements of the list where 1 < k < len(k)+1. That's the `for k in` in `match_limit`. It is an exhaustive search of all possible combinations. If there are no results, you could put in an `if` statement to use the `uptil_limit` and then just choose the maximum from that set. – dsagman Aug 29 '22 at 17:55
0

Here's another version. It forces full evaluation of the lists to allow for the conditions from OP, I believe.

from itertools import combinations, chain

# data = [["10000025",710],["1000138833",1065],["100274005",820],["1353180",3160],["481584",3670],["4851845845",1690],["485584",1310],["48848448",1000],["49849948",1050],["585598",4620],["84154858584",620],["841584584",2050],["8451184584",2860],["845188956",1800],["845845184",1300],["8458484",2300],["8954884",1780],["9416481584",2720],["9448155584",1000],["94949494",1000],["959494158",1590],["98558858",1550]]
data = [["10000025",71],["1000138833",165],["100274005",820],["1353180",160]]

limit = 10000

def data_sum(xs):
    global limit # using global to avoid having to do multiple argument passing in filter
    return (sum(x[1] for x in xs)) == limit

def my_sum(xs):
    return (sum(x[1] for x in xs))

match_limit = chain.from_iterable([filter(data_sum,combinations(data2,k)) for k in range(2,len(data)+1)])
match_limit_list = list(match_limit)
if len(match_limit_list) > 0:
    [print(m) for m in match_limit_list]
else:
    uptil_limit_list = list(chain.from_iterable([list(combinations(data2,k)) for k in range(2,7)]))
    uptil_limit_max = sorted(uptil_limit_list, key=my_sum, reverse=True)
    [print(u) for u in uptil_limit_max]

For the test data that does not sum to limit, the output is:

(['10000025', 71], ['1000138833', 165], ['100274005', 820], ['1353180', 160])
(['1000138833', 165], ['100274005', 820], ['1353180', 160])
(['10000025', 71], ['1000138833', 165], ['100274005', 820])
(['10000025', 71], ['100274005', 820], ['1353180', 160])
(['1000138833', 165], ['100274005', 820])
(['100274005', 820], ['1353180', 160])
(['10000025', 71], ['100274005', 820])
(['10000025', 71], ['1000138833', 165], ['1353180', 160])
(['1000138833', 165], ['1353180', 160])
(['10000025', 71], ['1000138833', 165])
(['10000025', 71], ['1353180', 160])
dsagman
  • 481
  • 1
  • 8
  • You still didn't got the point of expected output cause your code is just making combinations try to the run the other answer posted by @Xiidref it is slow but gives correct output where as your does not. – pb36 Aug 30 '22 at 03:21
  • I see, OP wants to extract from the data set a subset of up to 6 elements that sum to 10,000. And then run again against that reduced set, and then print remaining. Is that correct? – dsagman Aug 30 '22 at 15:19