I have a json object:
json ={'message_id': '1', 'token': 'a'}
{'message_id': '2', 'token': 'b'}
{'message_id': '3', 'token': 'c'}
{'message_id': '4', 'token': 'd'}
{'message_id': '4', 'token': 'e'}
{'message_id': '1', 'token': 'f'}
{'message_id': '1', 'token': 'g'}
{'message_id': '1', 'token': 'h'}
{'message_id': '3', 'token': 'm'}
{'message_id': '3', 'token': 'k'}
I want to batch the token into trunks to pass to API call, the catch is try to fit tokens with same message_id in one batch if possible, the idea is try not to split same messageid's token into 2 batches.
for example, I want to divide the 10 messages into 2 batch, that means 5 tokens in each array. So in the example above, 1 has 4 tokens, 2 has 1 token, 3 has 3 tokens, 4 has 2 tokens which adds up to 10. The ideal way to group this is 4+1 and 2+3. the final result I am looking for is:
[['a', 'f', 'g', 'h', 'b'] ,['c','d','e','m','k']]
because 'a', 'f', 'g', 'h' has same message id so have them in one batch instead of split messge_id 1's token into 2 arrays
I think this is more mathmatical than coding. Cuz I am able to batch them easily with the following code if I dont have to consider the grouping under same id in one batch
def batch(list, n):
for i in range(0, len(list), n):
print(i)
yield l[i:i + n]
I will elaborate further, the goal is to split m messages into n batches(input variable), and try to group same message_id into same batch if possible, I understand there's always overfloat possibility and if one message_id has more than m/n tokens, which exceeds the limit and it has to go 2 batches.