I want to extract Cardinal(CD) values associated with Units of Measurement and store it in a dictionary. For example if the text contains tokens like "20 kgs", it should extract it and keep it in a dictionary.
Example:
for input text, “10-inch fry pan offers superb heat conductivity and distribution”, the output dictionary should look like,
{"dimension":"10-inch"}
for input text, "This bucket holds 5 litres of water.", the output should look like,
{"volume": "5 litres"}
line = 'This bucket holds 5 litres of water.' tokenized = nltk.word_tokenize(line) tagged = nltk.pos_tag(tokenized)
The above line would give the output:
[('This', 'DT'), ('bucket', 'NN'), ('holds', 'VBZ'), ('5', 'CD'), ('litres', 'NNS'), ('of', 'IN'), ('water', 'NN'), ('.', '.')]
Is there a way to extract the CD and UOM values from the text?