How to extract the indices of equal elements from a list?

Question

If my input a = [3.0, 4.0, 2.5, 2.5, 3.0], There are two 3.0, one 4.0, and two 2.5

As you can see, a[0] = a[4], a[1] is just by itself, and a[2] = a[3]

I want my output as [[0,4], [1], [2,3]]

How would I code this? Is there a built-in function to do this? If not, any hint or suggestion? Thank you.

There is no built-in function for the same. Can you please share the code you tried till now? — Swaroop Deval, Oct 11 '19 at 05:43
To be honest i could not even try to think about anything for this one. I just need an idea to start with. I am just stuck — David L, Oct 11 '19 at 05:53

yukashima huksay · Answer 1 · 2019-10-13T06:27:00.953

3

First, a simple solution leveraging collections.defaultdict. Be careful, this only works for flat (1-dimensional) lists!

from collections import defaultdict 

def group_by_value(lst_in):
    res_dict = defaultdict(list)
    for idx, item in enumerate(lst_in):
        res_dict[item].append(idx)
    return res_dict

defaultdict is just like a dict except that when you initialize it like d = defaultdict(list) then d[x] would resolve to an empty list [] if x is not among the keys defined in d.

Here is a solution using numpy which works regardless of the number of dimensions. Credit for the idea goes to @meTchaikovsky.

import numpy as np

def group_by_value(arr):
    return {elem: list(zip(*np.where(arr == elem))) for elem in np.unique(arr)}

Thanks to @AlexanderCécile for refactoring this answer ;)

edited Oct 13 '19 at 06:27

answered Oct 11 '19 at 05:52

yukashima huksay

5,834
7
45
78

This is the best solution, I hope he changes the accepted answer to be this one :) – AMC Oct 13 '19 at 01:40
@David L this is a much better solution, it's better to accept this answer. – meTchaikovsky Oct 13 '19 at 02:48
Alright, I made an edit that changes some of the writing, as well as tweaks the form of the code itself. I also added a numpy solution based on the one by @meTchaikovsky. Obviously, this was done in the hopes that David L will make this the accepted answer, so that it can be as informative as possible. – AMC Oct 13 '19 at 04:26

score 2 · Answer 2 · answered Oct 11 '19 at 05:53

2

sol = {}        
for i, j in enumerate(a):
    if j in sol:
        sol[j].append(i)
    else:
        sol[j] = [i]
[sol[i] for i in sol]
[[0, 4], [1], [2, 3]]

answered Oct 11 '19 at 05:53

Vox

506
2
13

meTchaikovsky · Accepted Answer · 2019-10-11T05:52:43.370

0

You can do this with the help of numpy

import numpy as np

a = np.array(a)
result = [np.arange(len(a))[a == item].tolist() for item in np.unique(a)]

in which result is the list you want.

To do it without using numpy

result = []
unique_vals = set(a)

for val in unique_vals:
    tmp = []
    for ind, item in enumerate(a):
        if item == val:
            tmp.append(ind)
    result.append(tmp)

edited Oct 11 '19 at 05:52

answered Oct 11 '19 at 05:48

meTchaikovsky

7,478
2
15
34

Is there another way of doing this? i do not know how numpy works yet – David L Oct 11 '19 at 05:50
@DavidL yes, I have updated my answer – meTchaikovsky Oct 11 '19 at 05:53
Thank you sir i'll study your answer – David L Oct 11 '19 at 05:55
1

The pure python solution here seems quite wasteful/inefficient. It iterates over a list of size `n`, `n + 1` times! @DavidL please consider changing the accepted answer to [the one provided by @yukahashima] (https://stackoverflow.com/a/58334995/11301900), if only for the benefit of anyone else who stumbles upon this question. – AMC Oct 13 '19 at 01:39
1

In fact, it seems that the numpy solution is also subpar. I ran some quick benchmarks and its results were anything but, being on average _twice as slow_ as the [aforementioned solution](https://stackoverflow.com/a/58334995/11301900). Even worse, it is quicker to convert a numpy array to a list and then use the defaultdict method, than it is to get a result from this numpy-based solution. For the largest list size I tested the defaultdict solution takes 2.5 seconds, versus an _astonishing_ 74 seconds for your python solution. Yes, it's really **30 times slower**. – AMC Oct 13 '19 at 02:25
@AlexanderCécile you are right. – meTchaikovsky Oct 13 '19 at 02:47
1

In conclusion, I believe it is fair to say that the 2 solutions presented here are _never_ superior to the `defaultdict` and `enumerate`-based alternative, and this regardless of whether our input data structure is a list or a numpy array. There are some other issues, of course, although they pale in comparison to the disastrous performance problems. For one, using `len()` to count the number of elements in a numpy array is an awful habit, since this only works correctly on flat (1-dimensional) arrayy. – AMC Oct 13 '19 at 02:51
@AlexanderCécile thank you for your advice, yes, maybe using `shape` is a better option – meTchaikovsky Oct 13 '19 at 02:56
@meTchaikovsky good on you for not being afraid of criticism or of being wrong! You could always delete your answer, but it might be more interesting for the sake of posterity to simply edit it instead, and have the `defaultdict` one be accepted. I will suggest an edit now, including a numpy-based solution that is quite similar to yours but who's performance is more in line with the alternative solutions. – AMC Oct 13 '19 at 02:59
@meTchaikovsky in this case you want the total number of elements, which can be found using the `size` property. – AMC Oct 13 '19 at 03:02
@AlexanderCécile It's greate that meTchaikovsky is not afraid of criticism. However, I strongly believe that you could have used a much better language in criticizing them. Thank you for your edit, I've accepted it, I believe it would be good if you could also do a new edit and include your benchmarks for this numpy solution along with your own numpy solution and my defaultdict solution, the python only solution would also be nice if you felt for it ;) – yukashima huksay Oct 13 '19 at 06:32
@yukashimahuksay I tried to find a good balance in my comments, it's tough. I certainly don't want to be excessively harsh, but I also don't want to diminish the issues at hand. meTchaikovsky I will state that my comments were not directed at your person, and from looking at your profile it is clear that you have already made many good contributions to this site :) As for the benchmarks, should I include the source code, test code and the measured times, or would that be too much? – AMC Oct 13 '19 at 17:52
@AlexanderCécile If you have ipython I suggest doing it like [this](https://stackoverflow.com/a/58073505/5120089) If you don't have ipython but you are on linux you can do it like [this](https://stackoverflow.com/a/57287702/5120089) if you wanna be cool you can format the output like [this](https://stackoverflow.com/a/58091295/5120089) ofc you can also plot it but that would be too much :)) I tried to compensate for you as you have probably guessed ;) – yukashima huksay Oct 13 '19 at 18:09
@yukashimahuksay my benchmarks were actually done on Google Colaboratory, so I could just link to the notebook directly! That way, people can change the code and rerun the benchmarks quite easily. – AMC Oct 13 '19 at 19:33
@AlexanderCécile that would be a nice idea, but also at least include a summary of your results in case the link becomes invalid in the future. – yukashima huksay Oct 13 '19 at 20:06
@yukashimahuksay Alright, will do! – AMC Oct 14 '19 at 20:13

How to extract the indices of equal elements from a list?

3 Answers3