Can we have a tokenizer output on a single line like that of Apache OpenNLP with the command line tool? http://nlp.stanford.edu/software/tokenizer.shtml https://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.tokenizer
Asked
Active
Viewed 139 times
1 Answers
1
You can use DocumentPreprocessor
, either programmatically or from the command line.
From the CLI:
$ echo "This is a test. And some more." | java edu.stanford.nlp.process.DocumentPreprocessor 2>/dev/null
This is a test .
And some more .
You can do the same thing programmatically; see this SO answer.

Community
- 1
- 1

Jon Gauthier
- 25,202
- 6
- 63
- 69
-
Thx Jon! I notice the output is tokenized, and I would like to avoid that. Any way to skip tokenization with Stanford NLP? – giorgio79 Feb 12 '15 at 18:48
-
Yes—use whitespace tokenization. Run `DocumentPreprocessor` with the `-help` option for details. – Jon Gauthier Feb 12 '15 at 19:24