0

I would like to count part of speech tags. So far I have the part of speech tags (for German) stored in a dictionary, where the key the POS-tag is, and the value the number of occurrences is.

When I count, I want to summarize 'NN' and 'NE' as one variable 'nouns_in_text', because both of them are nouns. I did this partially successfully. When I have an input-text in which I have both 'NN' and 'NE', in this case my code is working, and I get the correct result, meaning the sum of 'NN' and 'NE'.

But the problem is, when I have an input text, which for example has only 'NN' and no 'NE', then I get a KeyError.

I need the code to look if there are 'NN' or 'NE' in the input-text. If there are 'NN' and 'NE', then sum them up. If there is only 'NN' then return just the number of occurrences for 'NN', and the same if there is only 'NE'. In case there is neither 'NN' nor 'NE' return 0 or "None".

I would like a Code, that would work for all three in the following described scenarios, without getting an Error.

# First Scenario: NN and NE are in the Input-Text
myInput = {'NN': 3, 'NE': 1, 'ART': 1, 'KON': 1}

# Second Scenario: Only NN is in the Input-Text
#myInput = {'NN': 3, 'ART': 1, 'KON': 1}

# Third Scenario: Neither NN nor NE are in the Input-Text
#myInput = {'ART': 1, 'KON': 1}

def check_pos_tag(document):
    return document['NN'] + document['NE']

nouns_in_text = check_pos_tag(myInput)
print(nouns_in_text)

# Output = If NN and NE are in the input text I get 4 as result
# But, if NN or NE are not in the input text I get a KeyError

I think I could or should solve this problem with if-else conditions or with try-except blocks. But I'm not sure how to realize this ideas... Any suggestions? Thank you very much in advance! :-)

AnnaLise
  • 37
  • 5

4 Answers4

5

Use dict.get which takes the arguments (key, default) so if key is not in document then default is returned instead.

def check_pos_tag(document):
    return document.get('NN', 0) + document.get('NE', 0)
FHTMitchell
  • 11,793
  • 2
  • 35
  • 47
2

This should do it:

def check_pos_tag(document):
    return document.get('NN', 0) + document.get('NE', 0)
zipa
  • 27,316
  • 6
  • 40
  • 58
2

Use defaultdict instead of dict

from collections import defaultdict
myInput = defaultdict(int, {'NN': 3, 'ART': 1, 'KON': 1})

With this, your current check_pos_tag function would work without any modification

check_pos_tag(myInput)
# 3
Sunitha
  • 11,777
  • 2
  • 20
  • 23
1

Verbose version:

def check_pos_tag(document):
    nn = document['NN'] if 'NN' in document else 0
    ne = document['NE'] if 'NE' in document else 0
    return nn + ne
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91