Merge lists with same first element in list of lists

Question

I have a list of lists:

a = [[0, 1], [0, 2], [0, 26], [0, 74], [1, 77], [1, 80], [1, 81], [2, 117], [2, 118], [2, 119], [2, 120]]

How can I combine all lists in the list with the same first element

Desired output:

a = [[0, 1, 2, 26, 74], [1, 77, 80, 81], [2, 117, 118, 119, 120]]

In your example the lists are sorted by first element. Is that always true? — Mark, Feb 01 '22 at 22:19
What if the input list contains the same sublist multiple times, e.g. `[[1, 81], [1, 81], [1, 81], [1, 93], ...]`. Would the output list be `[1, 81, 93]` or `[1, 81, 81, 81, 93]`? — 9769953, Feb 01 '22 at 22:22
If the list are sorted by key, you can use `itertools.groupby` — mozway, Feb 01 '22 at 22:32

score 2 · Answer 1 · answered Feb 01 '22 at 22:30

2

from collections import defaultdict
tmp = defaultdict(list)
for key, val in a:
    tmp[key].append(val)
print([[key] + val for key, val in tmp.items()])

answered Feb 01 '22 at 22:30

Andrey

400
2
8

score 2 · Accepted Answer · answered Feb 01 '22 at 22:32

2

Try this:

d = {}
for key, value in a:
   if key not in d.keys():
      d[key] = [key]
   d[key].append(value)
result = list(d.values())

answered Feb 01 '22 at 22:32

Marco Valle

176
6

2

Note that you can just write [`if key not in d`](https://stackoverflow.com/a/1602964/11659881). – Kraigolas Feb 01 '22 at 22:41
1

That works perfectly! – idis Feb 01 '22 at 23:02

score 2 · Answer 3 · answered Feb 01 '22 at 22:47

I think the other answers here are specific to two item lists. Here's one that works with any number of items in your sublists (as long as there's at least one):

a = [[0, 1], [0, 2], [0, 26], [0, 74], [1, 77], [1, 80], [1, 81], [2, 117], [2, 118], [2, 119], [2, 120]]
output_dict = {}
for key, *values in a:
    if key not in output_dict:
        output_dict[key] = [key]
    output_dict[key].extend(values)

Now the results are in output_dict.values().

score 0 · Answer 4 · edited Feb 02 '22 at 17:14

0

I would do it this way.
Here I assume that input is a list of sublist 2 lengths long.

def merge_list(input):
    res = [] # Final list
    a = []   # Just make a list of the first element of each list
    for i in input:
        if i[0] not in a:
            a.append(i[0])
    for i in a:
        b = [i]
        for j in input:
            if j[0] == i:
                # If you want input like [[1, 2, 3], [1, 4, 6]..]
                # Copy with a for excluding the first element instead of this j[1]
                b.append(j[1])
        res.append(b)
    print(res)

edited Feb 02 '22 at 17:14

lane

766
5
20

answered Feb 01 '22 at 22:25

Quentin AM

19
2

Your first loop can be replaced by `a = set(i[0] for i in input)`. Please use more descriptive variable names – Mad Physicist Feb 01 '22 at 22:38

mathfux · Answer 5 · 2022-02-01T23:21:26.967

Since this question has a numpy tag I'll extend about possible ways to solve it in numpy. In general, this is called a group by problem. There are many ways you can do this in numpy. You can classify them into two categories:

Methods based on np.unique
Methods based on np.bincount

The second type of solutions won't work in general if IDs of groups are large but this is a significant boost of np.unique in case IDS are small.

You need to sort your data by the first column before you apply any kind of these methods:

a = np.array(a)
arr = a[a[:, 0].argsort()]

Then you can choose your method of grouping and a custom return:

def _custom_return(unique_id, a, split_idx, return_groups):
    '''Choose if you want to also return unique ids'''
    if return_groups:
        return unique_id, np.split(a[:,1], split_idx)
    else: 
        return np.split(a[:,1], split_idx)
    
def numpy_groupby_index(a, return_groups=True):
    '''Code refactor of method of Vincent J'''
    u, idx = np.unique(a[:,0], return_index=True) 
    return _custom_return(u, a, idx[1:], return_groups)

def numpy_groupby_bins(a, return_groups=True):  
    '''Significant boost of np.unique by np.bincount'''
    bins = np.bincount(a[:,0])
    nonzero_bins_idx = bins != 0
    nonzero_bins = bins[nonzero_bins_idx]
    idx = np.cumsum(nonzero_bins[:-1])
    return _custom_return(np.flatnonzero(nonzero_bins_idx), a, idx, return_groups)

numpy_groupby_bins(arr, return_groups=True)
>>> (array([0, 1, 2]),
[array([ 1,  2, 26, 74]), array([77, 80, 81]), array([117, 118, 119, 120])])
numpy_groupby_bins(arr, return_groups=False)
>>> [array([ 1,  2, 26, 74]), array([77, 80, 81]), array([117, 118, 119, 120])]
numpy_groupby_index(arr, return_groups=True)
>>> (array([0, 1, 2]),
[array([ 1,  2, 26, 74]), array([77, 80, 81]), array([117, 118, 119, 120])])
numpy_groupby_index(arr, return_groups=False)
>>> [array([ 1,  2, 26, 74]), array([77, 80, 81]), array([117, 118, 119, 120])]

Note that all the methods contain np.split method which is based on list.append under the hood and hence it is not efficient in case you've got a big bunch of small groups. This happens because numpy is not designed to work with arrays of different lengths.

Also note that the output you expect requires one more iteration:

groups = numpy_groupby_index(arr, return_groups=True)
out = [np.r_[key, group] for key, group in zip(*groups)]
out
>>> [array([ 0,  1,  2, 26, 74]),
 array([ 1, 77, 80, 81]),
 array([  2, 117, 118, 119, 120])]

If you're interested in performant solutions of this problem you could also read my further analysis on this kind of problem

Merge lists with same first element in list of lists

5 Answers5