2

I am fairly new to python and I could not figure out how to do the following.

I have a list of (word, tag) tuples

a = [('Run', 'Noun'),('Run', 'Verb'),('The', 'Article'),('Run', 'Noun'),('The', 'DT')]

I am trying to find all tags that has been assigned to each word and collect their counts. For example, word "run" has been tagged twice to 'Noun' and once to 'Verb'.

To clarify: I would like to create another list of tuples that contains (word, tag, count)

Nina
  • 91
  • 4
  • 11

2 Answers2

2

Pretty easy with a defaultdict:

>>> from collections import defaultdict
>>> output = defaultdict(defaultdict(int).copy)
>>> for word, tag in a:
...     output[word][tag] += 1
...     
>>> output
defaultdict(<function copy>,
            {'Run': defaultdict(int, {'Noun': 2, 'Verb': 1}),
             'The': defaultdict(int, {'Article': 1, 'DT': 1})})
wim
  • 338,267
  • 99
  • 616
  • 750
  • Interesting way to make a defaultdict of "defaultdicts"..I usually use `defaultdict(lambda x: defaultdict(int))`.. – alecxe Sep 19 '16 at 21:53
  • 1
    @alecxe I stole the trick from another SO member [here](http://stackoverflow.com/questions/33080869/python-how-to-create-a-dict-of-dict-of-list-with-defaultdict/35759455#comment59184169_33081175). It's a little faster on python3, and a little slower on python 2.. – wim Sep 19 '16 at 22:04
2

You can use collections.Counter:

>>> import collections

>>> a = [('Run', 'Noun'),('Run', 'Verb'),('The', 'Article'),('Run', 'Noun'),('The', 'DT')]
>>> counter = collections.Counter(a)
Counter({('Run', 'Noun'): 2, ('Run', 'Verb'): 1, ... })

>>> result = {}
>>> for (tag, word), count in counter.items():
...     result.setdefault(tag, []).append({word: count})

>>> print(result)
{'Run': [{'Noun': 2}, {'Verb': 1}], 'The': [{'Article': 1}, {'DT': 1}]}
Ozgur Vatansever
  • 49,246
  • 17
  • 84
  • 119