0

Somewhat a python/programming newbie here.

I am trying to access a specified range of tuples from a list of tuples, but I only want to access the first element from the range of tuples. The specified range is based on a pattern I am looking for in a string of text that has been tokenized and tagged by nltk. My code:

from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag

text = "It is pretty good as far as driveway size is concerned, otherwise I would skip it"
tokenized = word_tokenize(text)
tagged = pos_tag(tokenized)

def find_phrase():
    counter = -1
    for tag in tagged:
        counter += 1
        if tag[0] == "as" and tagged[counter+6][0] == "concerned":
            print tagged[counter:counter+7]

find_phrase()

Printed output:

[('as', 'IN'), ('far', 'RB'), ('as', 'IN'), ('driveway', 'NN'), ('size', 'NN'), ('is', 'VBZ'), ('concerned', 'VBN')]

What I actually want:

['as', 'far', 'as', 'driveway', 'size', 'is', 'concerned']

Is it possible to modify the my line of code print tagged[counter:counter+7] to get my desired printed output?

Darren Haynes
  • 1,343
  • 4
  • 18
  • 31
  • 1
    FYI whenever you find yourself writing a counter variable that just gets incremented in a loop, you should probably be using `enumerate` instead. – roippi Jan 29 '14 at 04:24

3 Answers3

3

Probably the simplest method uses a list comprehension. This statement creates a list from the first element of every tuple in your list:

print [tup[0] for tup in tagged[counter:counter+7]]

Or just for fun, if the tuples are always pairs, you could flatten the list (using any method you like) and then print every second element with the step notation of python's slice notation:

print list(sum(tagged[counter:counter+7], ()))[::2]

Or use map with the itemgetter function, which calls the __getitem__() method to retrieve the 0th index of every tuple in your list:

from operator import itemgetter
print map(itemgetter(0), tagged[counter:counter+7])

Anything else? I'm sure there are more.

Community
  • 1
  • 1
jayelm
  • 7,236
  • 5
  • 43
  • 61
2

You can use like this:

result, _ = zip(*find_phrase())
print result
James Sapam
  • 16,036
  • 12
  • 50
  • 73
  • that works great, but I have no idea how it works. Zip I am familiar with, but I have never seens a variable being declared with a trailing comma and an underscore `result, _`. What is going on there, or could you point me to some documentation about it? – Darren Haynes Jan 29 '14 at 05:30
  • 1
    @Darren it's just a valid variable name. By common convention, naming a variable `_` means 'I'm not using this.' Look up "tuple unpacking" if you don't understand how two things are being assigned to on the left hand side. – roippi Jan 29 '14 at 05:43
  • ah sorry, i was away from keyboard and roippi explained all. Thanks @roippi. – James Sapam Jan 29 '14 at 05:57
0

Have you tried zip? also item[0] for item in name