2

What are the set of PoS labels produced by Standford NLP (including PoS for punctuation tokens), and its description?

I know this question has been asked several times, such as in:

but those answers list some typical PoS labels which are not specific to Standfor NLP. For instance, none of those answers list the -LRB- PoS label used by Stanford NKLP for the ( punctuation.

Where can I find this list of PoS labels in the source code of the Stanford NLP?

Also, what are some token examples annotated with the SYM PoS label?

Also, how to know if a token is a punctuation? Here they define isPunctation == true if its PoS is :|,|.|“|”|-LRB-|-RRB-|HYPH|NFP|SYM|PUNC. However Stanford NLP does not have all these PoS.

David Portabella
  • 12,390
  • 27
  • 101
  • 182

1 Answers1

4

It is the Penn Treebank POS set, but many descriptions of this tag set seem to omit punctuation marks. Here is a complete list of tags:

https://www.eecis.udel.edu/~vijay/cis889/ie/pos-set.pdf

(But parentheses are tagged as -LRB- and -RRB-, not sure why they don't mention this in the documentation.)

Sebastian Schuster
  • 1,563
  • 10
  • 7
  • thx. I see that the official doc mentions the Penn Treebank POS set, and it links to a page with the list: https://nlp.stanford.edu/software/tagger.shtml Still, this list does not show the -LRB- POS. That's why I would prefer to see where in the source code this is implemented. It also gives some examples of the SYM POS: http://www.comp.leeds.ac.uk/amalgam/tagsets/upenn.html – David Portabella Jun 27 '17 at 10:07
  • The tags aren't hard-coded anywhere in the code. (they are stored as part of the serialized model.) But the list that I posted should be complete, it just does not mention that ( is written as -LRB- and ) is written as -RRB-. – Sebastian Schuster Jun 28 '17 at 18:11
  • thx. I see that you are very familiar with the code and authors (or you are even among the authors?). maybe you could propose the authors to add this information in the official page: https://nlp.stanford.edu/software/tagger.shtml – David Portabella Jun 29 '17 at 08:09