Stanford NLP: Tokenize output on a single line?

Question

score 1 · Accepted Answer · edited May 23 '17 at 12:27

1

You can use DocumentPreprocessor, either programmatically or from the command line.

From the CLI:

$ echo "This is a test. And some more." | java edu.stanford.nlp.process.DocumentPreprocessor 2>/dev/null
This is a test .
And some more .

You can do the same thing programmatically; see this SO answer.

edited May 23 '17 at 12:27

Community

answered Feb 12 '15 at 17:27

Jon Gauthier

Thx Jon! I notice the output is tokenized, and I would like to avoid that. Any way to skip tokenization with Stanford NLP? – giorgio79 Feb 12 '15 at 18:48
Yes—use whitespace tokenization. Run `DocumentPreprocessor` with the `-help` option for details. – Jon Gauthier Feb 12 '15 at 19:24

1 Answers1