Keep strings that occur N times or more

Question

I have a list that is

mylist = ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd']

And I used Counter from collections on this list to get the result:

from collection import Counter
counts = Counter(mylist)

#Counter({'a': 3, 'c': 2, 'b': 2, 'd': 1})

Now I want to subset this so that I have all elements that occur some number of times, for example: 2 times or more - so that the output looks like this:

['a', 'b', 'c']

This seems like it should be a simple task - but I have not found anything that has helped me so far.

Can anyone suggest somewhere to look? I am also not attached to using Counter if I have taken the wrong approach. I should note I am new to python so I apologise if this is trivial.

possible duplicate of [Python removing duplicates in lists](http://stackoverflow.com/questions/7961363/python-removing-duplicates-in-lists) — Anand S Kumar, Jun 15 '15 at 04:26
Just a note - this is a toy example. I need the number of times an item occurs to be flexible to other numbers. I thought this was clear by the title but I will edit the question to be more specific. — SamPassmore, Jun 15 '15 at 04:35

score 5 · Accepted Answer · answered Jun 15 '15 at 04:29

5

[s for s, c in counts.iteritems() if c >= 2]
# => ['a', 'c', 'b']

answered Jun 15 '15 at 04:29

Amadan

191,408
23
240
301

score 1 · Answer 2 · answered Jun 15 '15 at 04:28

1

Try this...

def get_duplicatesarrval(arrval):
    dup_array = arrval[:]
    for i in set(arrval):
        dup_array.remove(i)       
    return list(set(dup_array))   



mylist = ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd']
print get_duplicatesarrval(mylist)

Result:

[a, b, c]

answered Jun 15 '15 at 04:28

Deenadhayalan Manoharan

5,436
14
30
50

How can i specify what number of times the result needs to occur? To extend from my example - what if I decide to only accept results that occur 3 or more times? – SamPassmore Jun 15 '15 at 04:33

score 1 · Answer 3 · answered Jun 15 '15 at 04:33

1

The usual way would be to use a list comprehension as @Adaman does.
In the special case of 2 or more, you can also subtract one Counter from another

>>> counts = Counter(mylist) - Counter(set(mylist))
>>> counts.keys()
['a', 'c', 'b']

answered Jun 15 '15 at 04:33

John La Rooy

295,403
53
369
502

Hi John, Thanks for your comment. Sorry I wasn't specific enough in the question. I didn't realise 2 or more was a special case. – SamPassmore Jun 15 '15 at 04:49
@SamPassmore, it's really not that special or particulary fast to do it this way. In my experience, it does turn up in more often in real programs - counting anagrams, factors of composite numbers etc. But the list comprehension is fine in any case. – John La Rooy Jun 15 '15 at 07:27

rajeshv90 · Answer 4 · 2015-06-15T04:56:50.930

0

from itertools import groupby

mylist = ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd']

res = [i for i,j in groupby(mylist) if len(list(j))>=2]

print res
['a', 'b', 'c']

edited Jun 15 '15 at 04:56

answered Jun 15 '15 at 04:51

rajeshv90

574
1
7
17

score 0 · Answer 5 · answered Jun 15 '15 at 06:52

I think above mentioned answers are better, but I believe this is the simplest method to understand:

mylist = ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd']
newlist=[]
newlist.append(mylist[0])
for i in mylist:
    if i in newlist:
        continue
    else:
        newlist.append(i)
print newlist

>>>['a', 'b', 'c', 'd']

Keep strings that occur N times or more

5 Answers5