1

I'm having somewhat of a difficult problem obtaining items in tuples. I have a list of tuples and it looks like this (containing a word and a tag):

[('An', 'DET'),
 ('autumn', 'NOUN'),
 ('evening', 'NOUN'),
 ('.', '.'),
 ('In', 'ADP'),
 ('an', 'DET'),
 ('old', 'ADJ'),
 ('woodshed', 'NOUN'),
 ('The', 'DET'),
 ('long', 'ADJ'),
 ('points', 'NOUN'),
 ('of', 'ADP'),
 ('icicles', 'NOUN'),
 ('Are', 'NOUN'),
 ('sharpening', 'VERB'),
 ('the', 'DET'),
 ('wind', 'NOUN'),
 ('.', '.')....]

What I would like to do is iterate through these tuples and determine the likelihood of what the next word tag is based on the previous one. For instance, if I wanted to determine how many times 'DET' appears in front of a 'NOUN', I would want to iterate through the tuples and determine, for instance:

number of times 'DET' appears in front of 'NOUN'

So far, I have tried this:

prob = 0.0
for item in tuples:
   if item[1] == "DET" and item + 1[1] == "NOUN"
return prob

The if statement is obviously not correct. Does anyone know what I can do to access the next item?

Omid
  • 2,617
  • 4
  • 28
  • 43
natalien
  • 81
  • 3
  • 13
  • Check out http://stackoverflow.com/questions/6822725/rolling-or-sliding-window-iterator-in-python you can use a sliding window iterator to yield pairs of tuples from your list. – Brian Schlenker May 07 '16 at 05:03

2 Answers2

1

The easiest way to bring the words together pairwise is to use zip(seq, seq[1:]) as shown in the recipes section for the itertools module.

And the easiest way to collect the counts is to use collections.Counter().

Putting it all together looks like this:

>>> from collections import Counter

>>> Counter((f, s) for (_, f), (_, s) in zip(tuples, tuples[1:]))
Counter({('ADJ', 'NOUN'): 2, ('NOUN', 'ADP'): 2, ('NOUN', 'NOUN'): 2,
         ('DET', 'NOUN'): 2, ('DET', 'ADJ'): 2, ('ADP', 'NOUN'): 1,
         ('NOUN', 'VERB'): 1, ('NOUN', 'DET'): 1, ('VERB', 'DET'): 1,
         ('ADP', 'DET'): 1})
Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
0

Use enumerate() to get the index of the item you're looping through:

count = 0
for index, item in enumerate(tuples[:-1]):
    if item[1] == 'DET' and tuples[index+1][1] == 'NOUN':
        count += 1

print count
TerryA
  • 58,805
  • 11
  • 114
  • 143