Python: looking for duplicates in list

Question

I have a list of floats, and I want to know how many duplicates are in it.

I have tried with this:

p = t_gw.p(sma, m1, m2)       #p is a 1d numpy array
p_list = list(p)
dup = set([x for x in p_list if p_list.count(x) > 1])
print dup

I have also tried to use collections.counter, but I always get the same error

TypeError: unhashable type: 'numpy.ndarray'

I've looked around in similar questions, but I can't understand what hashable means, why a list (or numpy array) is not hashable and what kind of type should I use.

possible dublicate for http://stackoverflow.com/questions/9835762/find-and-list-duplicates-in-python-list — Ammar, Nov 15 '14 at 10:42
you wrap one or more numpy arrays in a list then you make a list comprehension and then you wrap the remaining numpy arrays in a set. for wrapping in a set you must have hashable items. numpy arrays aren't. — NoDataDumpNoContribution, Nov 15 '14 at 10:43

Daniel · Accepted Answer · 2014-11-15T10:51:04.497

2

Your numpy-array is two-dimensional. So list(p) does not do, what you expect. Use list(p.flat) instead.

Or (mis)use numpy's histogram function:

cnt, bins = numpy.histogram(p, bins=sorted(set(p.flat))+[float('inf')])
dup = bins[cnt>1]

edited Nov 15 '14 at 10:51

answered Nov 15 '14 at 10:45

Daniel

42,087
4
55
81

That was useful! I have only one thing to ask: if `p.shape = (1012, 1)` isn't p a 1d array? – Argentina Nov 15 '14 at 11:07
`shape` has two elements, therefore it is two-dimensional. `p.ravel().shape = (1012,)` is one-dimensional. – Daniel Nov 15 '14 at 11:21
Sorry, but I can't figure out in what are they different: I can only think at both of them as arrays with one column (!) – Argentina Nov 15 '14 at 11:43
In the first case, to get an element you have to write `p[i,0]`, with only one-dimension, `p[i]` is enough. – Daniel Nov 15 '14 at 12:08

score -1 · Answer 2 · answered Nov 15 '14 at 10:43

It depends what do you mean by number of duplicates.

An easy way to do this is to use hash:

h = {}
arr = [6, 3, 1, 1, 6, 2, 1]
for i in arr:
    if i in h:
        h[i] += 1
    else:
        h[i] =1

print h

Now if you mean that duplicates are the values that are used more then once in the list, you can do this with:

num = 0
for i in h:
    if h[i] > 1:
        num += 1

print num

I think that it is pretty easy to modify it to numpy.

score -1 · Answer 3 · answered Nov 15 '14 at 10:45

-1

you want to count something in a list ? why not use the count method of list object ?

number = my_list.count(my_float)

answered Nov 15 '14 at 10:45

Ludovic Viaud

202
1
5

Python: looking for duplicates in list

3 Answers3