1

I'm very new to Python and do know that my question is very simple but I've not found an existed question on SO yet.

I have an array contains string elements. Now I want to extract elements and count the number of appearances of them, them sort in descending order.

For example:

['ab' 'ab' 'ac']

then the output should be:

'ab' 2
'ac' 1

Also, it's bad of me that I don't know what is the best way to store my output (in a map, hash... or something like that? Again, I'm not sure)...

Thanks for any help.

Thiem Nguyen
  • 6,345
  • 7
  • 30
  • 50
  • 1
    Incidentally, this isn't an array, it's a`list`, or more generally a "sequence". In python, `array` refers to a specific data type. – Joel Cornett Jul 05 '12 at 19:33

3 Answers3

3

This can be done using the Counter class from the collections module.

from collections import Counter
x = ['ab', 'ab', 'ac']
counts = Counter(x)

counts stores the count information for each element; the full list of methods can be found in the documentation, but probably all you care about is that you can access counts directly by treating counts like a hash:

counts['ab']
>>> 2
bnaul
  • 17,288
  • 4
  • 32
  • 30
  • thanks and +1, but then how can I sort by frequency in descending order? – Thiem Nguyen Jul 05 '12 at 20:08
  • 1
    The `most_common` method will do this. `counts.most_common()` gives a list ordered from most frequent to least frequent of tuples of the form `(elem,count)`. You could iterate over this with e.g. `for elem, count in counts.most_common():`. – bnaul Jul 05 '12 at 20:21
  • thank you. I will accept this answer. By the way, what is the data type of `counts`? – Thiem Nguyen Jul 05 '12 at 20:23
  • 1
    It's a `Counter`, a specific class from the `collections` module. You can read about its methods in the documentation I linked. But you can treat it like a `dict` in many contexts (which is the Python version of a map or hash). – bnaul Jul 05 '12 at 20:26
1

There is some library called NLTK. Link - http://nltk.org/.

EDIT: I found something better:

You can look here too - real word count in NLTK.

Code example from the above link:

    from collections import Counter
    >>> text = ['this', 'is', 'a', 'sentence', '.']
    >>> counts = Counter(filtered)
    >>> counts
    Counter({'this': 1, 'a': 1, 'is': 1, 'sentence': 1})
Community
  • 1
  • 1
barak1412
  • 972
  • 7
  • 17
1

This is a classic problem, the so called "Word Count" problem. You would probably want to use a dictionary, python's built in amortized linear lookup type.

Declared like such:

dict = {}

You can then iterate over your list of tokens with a loop body resembling the following:

if token not in dict:
    dict[token] = 1
else
    dict[token] += 1

When you're done, you end up with a dictionary containing words as keys and frequencies as values.

The following documentation is relevant: http://docs.python.org/release/2.5.2/lib/typesmapping.html

Wug
  • 12,956
  • 4
  • 34
  • 54