2

I have a string s= "Mr.X is awesome. He is amazing.Mr.Y is awesome too."

I need to extract all the adjectives from the string along with the count of each adjectives. For example This string has adjectives "awesome","amazing" with a count of 2 for awesome and 1 for amazing.

for extracting adjectives , I have used NLTK. This is the code for extracting adjectives,

adjectives =[token for token, pos in nltk.pos_tag(nltk.word_tokenize(b)) if pos.startswith('JJ')]

I need the code to get a counter for each adjective in the string. It should be like adjectives : counter

Man utd
  • 131
  • 2
  • 13

2 Answers2

1

You can use collections.Counter:

>>> from collections import Counter

>>> adjectives = ['awesome', 'amazing', 'awesome']
>>> counts = Counter(adjectives)
>>> counts.items()
[('awesome', 2), ('amazing', 1)]

That can be converted into a dictionary if you like:

>>> dict(counts.items())
{'amazing': 1, 'awesome': 2}

Or you can access the keys and values:

>>> for key in counts.keys():
...     print key, counts.get(key)
awesome 2
amazing 1

edit:

For a list of lists, you need to flatten the lists:

>>> adjectives = [['awesome', 'amazing'], ['good', 'nice' ]]
>>> counts = Counter(adjective
...                  for group in adjectives
...                  for adjective in group)
>>> counts
Counter({'awesome': 1, 'good': 1, 'amazing': 1, 'nice': 1})

Or using itertools.chain.from_iterable:

>>> from itertools import chain
>>> Counter(chain.from_iterable(adjectives))
Counter({'awesome': 1, 'good': 1, 'amazing': 1, 'nice': 1})
Community
  • 1
  • 1
Peter Wood
  • 23,859
  • 5
  • 60
  • 99
  • Thanks for the solution.. But I have another problem...My list adjectives haas multiple lists inside it . It is basically like adjectives= [['awesome,'amazing'], ['good', 'nice' ] So when I run the Counter..it gives an array: TypeError: unhashable type: 'list' – Man utd Sep 22 '15 at 20:35
  • @DipitMalhotra added list of lists solution – Peter Wood Sep 23 '15 at 07:21
0

A possible solution for your problem uses Counter. The full solution is this

s= "Mr.X is awesome He is amazing Mr.Y is awesome too."
adjectives=["awesome", "beautiful", "handsome", "amazing"]
c=collections.Counter(s.split())
for key in list(c):
    if (key not in adjectives):
        del c[key]
print c

I created a list with the adjectives, because I assume your solution already works for you.

Next I split the sentence using the spaces to create a list of tokens. Pay attention that this will not work correctly with punctuations (for example, your sentence has "awesome." and this will map to a different key than "awesome"), but you can split as well as you like.

The split is given to the Counter method, which creates a counter object, which is a dict.

Then I iterate over key and remove all those keys that are not included in my list of adjectives. Notice that my for iterates over list(c) because del will cause the Counter to change size and if I used it straight, we would get an error in the for (object changed size).

I hope it helps. I believe you can fit it into your code.

rlinden
  • 2,053
  • 1
  • 12
  • 13