13

How to merge a tuple with the same key

list_1 = [("AAA", [123]), ("AAA", [456]), ("AAW", [147]), ("AAW", [124])]

and turn them into

list_2 = [("AAA", [123, 456]), ("AAW", [147, 124])]
codeforester
  • 39,467
  • 16
  • 112
  • 140
Theprofessor14
  • 149
  • 1
  • 5

4 Answers4

17

The most performant approach is to use a collections.defaultdict dictionary to store data as an expanding list, then convert back to tuple/list if needed:

import collections

list_1 = [("AAA", [123]), ("AAA", [456]), ("AAW", [147]), ("AAW", [124])]

c = collections.defaultdict(list)
for a,b in list_1:
    c[a].extend(b)  # add to existing list or create a new one

list_2 = list(c.items())

result:

[('AAW', [147, 124]), ('AAA', [123, 456])]

note that the converted data is probably better left as dictionary. Converting to list again loses the "key" feature of the dictionary.

On the other hand, if you want to retain the order of the "keys" of the original list of tuples, unless you're using python 3.6/3.7, you'd have to create a list with the original "keys" (ordered, unique), then rebuild the list from the dictionary. Or use an OrderedDict but then you cannot use defaultdict (or use a recipe)

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • 1
    Note that this answer would not necessarily retain the order of the original list, if that is a consideration. – blhsing Sep 22 '18 at 08:04
  • true, unless you're using python 3.7. To retain the original order you'd have to create a list with the original "keys" (ordered, unique), then rebuild the list from the dictionary. – Jean-François Fabre Sep 22 '18 at 08:04
5

You can use a dict to keep track of the indices of each key to keep the time complexity O(n):

list_1 = [("AAA", [123]), ("AAA", [456]), ("AAW", [147]), ("AAW", [124])]
list_2 = []
i = {}
for k, s in list_1:
    if k not in i:
        list_2.append((k, s))
        i[k] = len(i)
    else:
        list_2[i[k]][1].extend(s)

list_2 would become:

[('AAA', [123, 456]), ('AAW', [147, 124])]
codeforester
  • 39,467
  • 16
  • 112
  • 140
blhsing
  • 91,368
  • 6
  • 71
  • 106
1

You can create a dictionary and loop through the list. If the item present in dictionary append the value to already existing list else assign the value to key.

dict_1 = {}
for item in list_1:
    if item[0] in dict_1:
        dict_1[item[0]].append(item[1][0])
    else:
        dict_1[item[0]] = item[1]
list_2 = list(dict_1.items())
codeforester
  • 39,467
  • 16
  • 112
  • 140
S.Harish
  • 159
  • 1
  • 6
  • It does the merge but doesn't retain the original order in the list. The result I got is: `[('AAW', [147, 124]), ('AAA', [123, 456])]` - AAA should have been the first element as in the original list. – codeforester Sep 22 '18 at 15:35
  • If the order is important then we can use OrderedDict from the collections from collections import OrderedDict dict_1 = OrderedDict() – S.Harish Sep 23 '18 at 17:53
1

Similarly to other answers, you can use a dictionary to associate each key with a list of values. This is implemented in the function merge_by_keys in the code snippet below.

import pprint

list_1 = [("AAA", [123]), ("AAA", [456]), ("AAW", [147]), ("AAW", [124])]

def merge_by_key(ts):

    d = {}
    for t in ts:
        key = t[0]
        values = t[1]
        if key not in d:
            d[key] = values[:]
        else:
            d[key].extend(values)

    return d.items()



result = merge_by_key(list_1)

pprint.pprint(result)
Odysseas
  • 995
  • 9
  • 21
  • @codeforester Just a slip up. I had list_2 in my code in order to print it after the computed result, to check that my solution is correct. Removed it right now actually. – Odysseas Sep 22 '18 at 20:34