0
L = [('The', 'DT'), ('study', 'NN'), ('guide', 'NN'), ('does', 'VBZ'), ('not', 'VBZ'), ('discuss', 'VBZ'), ('much', 'NN'), ('of', 'IN'), ('the', 'DT'), ('basics', 'NN'), ('of', 'IN'), ('ethics.', 'NN')]

I want to remove the tuples having the tags other than 'NN' and 'DT' I tried the pop method it doesn't work. tried unzipping the two tuples but tuples are immutable. So how do i remove them.

Karan Jain
  • 405
  • 1
  • 7
  • 12

2 Answers2

0

You have to pop or delete their index for it to work, which for example, instead of L.pop(('The', 'DT')), you could do L.pop(L.index(('The', 'DT'))).

Not tested it out, but this should work if I've not got the wrong idea of what you want.

This way builds a list of the indexes you want to remove, then removes them (otherwise you'll be changing the size of the list while looking through it which won't really work in your favour).

invalid_tuples = []
for i, t in L:
    if t[1] not in ('NN', 'DT'):
        invalid_tuples.append(i)
for i in invalid_tuples:
    del L[i]

Or alternatively as a one line solution:

[i for i in L if i[1] in ('NN', 'DT')]
Peter
  • 3,186
  • 3
  • 26
  • 59
0
>>> L = [('The', 'DT'), ('study', 'NN'), ('guide', 'NN'), ('does', 'VBZ'), ('not', 'VBZ'), ('discuss', 'VBZ'), ('much', 'NN'), ('of', 'IN'), ('the', 'DT'), ('basics', 'NN'), ('of', 'IN'), ('ethics.', 'NN')]
>>> [(word, tag) for word, tag in L if tag not in ['DT', 'NN']]
[('does', 'VBZ'), ('not', 'VBZ'), ('discuss', 'VBZ'), ('of', 'IN'), ('of', 'IN')]
>>> [(word, tag) for word, tag in L if tag in ['DT', 'NN']]
[('The', 'DT'), ('study', 'NN'), ('guide', 'NN'), ('much', 'NN'), ('the', 'DT'), ('basics', 'NN'), ('ethics.', 'NN')]
alvas
  • 115,346
  • 109
  • 446
  • 738
  • If you want NPs, using a chunker would be more appropriate, e.g. `python senna.py --np test.txt` with `nltk_cli` https://github.com/alvations/nltk_cli – alvas Apr 03 '16 at 11:04