0

Dependency parsing using ClearNLP creates a DEPTree object. I have parsed a large corpus and serialized all the data in CoNLL format (e.g., this ClearNLP page on Google code).

But I can't figure out how to deserialize them. ClearNLP provides a DEPTree#toStringCoNLL() method (scroll down this page to see it). I am looking for something to read a CoNLL format parse tree and create a DEPTree object. I tried to reverse-engineer it, but didn't really understand the inner workings of the code.

I have, instead, created my own dependency tree class to handle the basic functionalities I need, but I would really like to know how to get a DEPTree object instead. So far, I haven't found any method in their API which does this.

Chthonic Project
  • 8,216
  • 1
  • 43
  • 92

1 Answers1

0

Found the answer, so sharing the wisdom on SO :-) ...

The deserialization can be done using the TSVReader in the edu.emory.clir.clearnlp.reader package.

public void readCoNLL(String inputFile) throws Exception {
    TSVReader reader = new TSVReader(0, 1, 2, 4, 5, 6, 7);
    reader.open(new FileInputStream(inputFile));
    DEPTree tree;
    while ((tree = reader.next()) != null)
        System.out.println(tree.toString(DEPNode::toStringDEP));
}

This is provided here by the author of ClearNLP, Jinho Choi.

In older versions (< 3.x) you will need to use the com.clearnlp.reader.DEPReader class instead of TSVReader.

Chthonic Project
  • 8,216
  • 1
  • 43
  • 92