5

I have been using spaCy Python package to parse and tag text and using the resulting dependency tree and other attributes to derive meaning. Now I would like to use SyntaxNet's Parsey McParseface for parsing and dependency tagging (which seems better), but I would like to keep using spaCy API because it is so easy to use and it does many things that Parsey doesn't. SyntaxNet outputs POS tags and dependency tags/tree in a CoNLL-format:

  1. Bob _ NOUN NNP _ 2 nsubj _ _
  2. brought _ VERB VBD _ 0 ROOT _ _
  3. the _ DET DT _ 4 det _ _
  4. pizza _ NOUN NN _ 2 dobj _ _
  5. to _ ADP IN _ 2 prep _ _
  6. Alice _ NOUN NNP _ 5 pobj _ _
  7. . _ . . _ 2 punct _ _

and spaCy seems to be able to read CoNLL format right here. But I can't figure out where in spaCy's API does it take a CoNLL-fromatted string.

Jason
  • 51
  • 3

3 Answers3

3

From the spaCy blog:

Obviously, we want to build a bridge between Parsey McParseface and spaCy, so that you can use the more accurate model with the sweeter spaCy API.

However, it looks like there still plenty of work to be done before this is possible.

See also the spaCy author's response here.

simon
  • 15,344
  • 5
  • 45
  • 67
1

Has anybody managed to get SyntaxNet running as a service yet? There's no problem loading annotations into spaCy. The problem is that SyntaxNet is primarily a research system, and it was sufficient for the experimental needs to operate on batches of text, from disk.

If you're content to read from disk, then there should be no problem --- just read in the CoNLL format, and then you can apply the annotations to spaCy Doc objects.

syllogism_
  • 4,127
  • 29
  • 22
  • I'm also looking into that, SyntaxNet as a service, I still haven't looked at this in detail, but this seems to be something in the direction: https://tensorflow.github.io/serving/ – David Batista Apr 04 '17 at 19:11
1

I did not tried with spaCy, but I've managed to use SyntaxNet's output inside Python NLTK's classes/structures, like DependencyGraph and Tree.

Here is a full example:

http://www.davidsbatista.net/blog/2017/03/25/syntaxnet/

David Batista
  • 3,029
  • 2
  • 23
  • 42