I have a list of (token, tag) tuples that looks like the following:
token_tags =
[('book', 'noun'),
('run', 'noun'),
(',', ','),
('book', 'verb'),
('run', 'adj'),
('run', 'verb')]
I am trying to find out how many times a token was first tagged as a 'noun' then as a 'verb' in its following appearance in the list. So, I should not count 'run' because it was tagged as an adjective between its 'noun' and 'verb' assignment. Any suggestions on how to do that?
I have converted the tuple into a dict as follows
d = {}
for x, y in token_tags:
d.setdefault(x, []).append(y)
So, now d contains:
{'book': ['noun', 'verb'], 'run': ['noun', 'adj', 'verb'], ',': [',']}
I have tried regular expresion to solve this but did not work.