Python: Shortest way to extract and count elements from an array of String?

Question

I'm very new to Python and do know that my question is very simple but I've not found an existed question on SO yet.

I have an array contains string elements. Now I want to extract elements and count the number of appearances of them, them sort in descending order.

For example:

['ab' 'ab' 'ac']

then the output should be:

'ab' 2
'ac' 1

Also, it's bad of me that I don't know what is the best way to store my output (in a map, hash... or something like that? Again, I'm not sure)...

Thanks for any help.

Incidentally, this isn't an array, it's a`list`, or more generally a "sequence". In python, `array` refers to a specific data type. — Joel Cornett, Jul 05 '12 at 19:33

score 3 · Accepted Answer · answered Jul 05 '12 at 19:31

3

This can be done using the Counter class from the collections module.

from collections import Counter
x = ['ab', 'ab', 'ac']
counts = Counter(x)

counts stores the count information for each element; the full list of methods can be found in the documentation, but probably all you care about is that you can access counts directly by treating counts like a hash:

counts['ab']
>>> 2

answered Jul 05 '12 at 19:31

bnaul

17,288
4
32
30

thanks and +1, but then how can I sort by frequency in descending order? – Thiem Nguyen Jul 05 '12 at 20:08
1

The `most_common` method will do this. `counts.most_common()` gives a list ordered from most frequent to least frequent of tuples of the form `(elem,count)`. You could iterate over this with e.g. `for elem, count in counts.most_common():`. – bnaul Jul 05 '12 at 20:21
thank you. I will accept this answer. By the way, what is the data type of `counts`? – Thiem Nguyen Jul 05 '12 at 20:23
1

It's a `Counter`, a specific class from the `collections` module. You can read about its methods in the documentation I linked. But you can treat it like a `dict` in many contexts (which is the Python version of a map or hash). – bnaul Jul 05 '12 at 20:26

score 1 · Answer 2 · edited May 23 '17 at 12:04

1

There is some library called NLTK. Link - http://nltk.org/.

EDIT: I found something better:

You can look here too - real word count in NLTK.

Code example from the above link:

    from collections import Counter
    >>> text = ['this', 'is', 'a', 'sentence', '.']
    >>> counts = Counter(filtered)
    >>> counts
    Counter({'this': 1, 'a': 1, 'is': 1, 'sentence': 1})

edited May 23 '17 at 12:04

Community

1
1

answered Jul 05 '12 at 19:28

barak1412

972
7
17

1

sincerely I'm working with some NLP stuffs but it would be better if you go more in details... :) – Thiem Nguyen Jul 05 '12 at 19:29

score 1 · Answer 3 · answered Jul 05 '12 at 19:32

This is a classic problem, the so called "Word Count" problem. You would probably want to use a dictionary, python's built in amortized linear lookup type.

Declared like such:

dict = {}

You can then iterate over your list of tokens with a loop body resembling the following:

if token not in dict:
    dict[token] = 1
else
    dict[token] += 1

When you're done, you end up with a dictionary containing words as keys and frequencies as values.

The following documentation is relevant: http://docs.python.org/release/2.5.2/lib/typesmapping.html

Python: Shortest way to extract and count elements from an array of String?

3 Answers3