Stanford-Core-NLP giving Java errors for text tokenization

Question

So I'm trying to run to tokenize the text using StanfordCore NLP for text summarization using this git repo. I have set the environment variables for Java-8 and I'm using python 2.7. When I run this command :

echo "This is text tokenization" | java -cp C:\Users\Harshit\Downloads\stanford-corenlp-full-2016-10-31\stanford-corenlp-full-2016-10-31\stanford-corenlp-3.7.0.jar\ edu.stanford.nlp.process.PTBTokenizer.class

It works fine and gives output as :

"This

is

text

tokenization"

But when I'm using the command:

python make_datafiles.py /path/to/cnn/stories /path/to/dailymail/stories.

I get this error :

'"java -cp"' is not recognized as an internal or external command,
operable program or batch file.
Exception: The tokenized stories directory cnn_stories_tokenized contains 0 files, but it should contain the same number as C:\Users\Harshit\Downloads\cnn_stories_tokenized\cnn_stories_tokenized (which has 92579 files). Was there an error during tokenization?

How do I solve this and tokenize the datafiles ?

score 0 · Answer 1 · answered Nov 12 '18 at 12:16

0

Can you please check that java path is properly configured or not?

Steps to check java path:

Go to cmd.
java -version
java version should come in the screen like "java version 1.x.xxx"
If not, then please configure java path. You can take help from below link to configure java path Environment variables for java installation

answered Nov 12 '18 at 12:16

JavaSat

34
2
8

Java is coming to be alright. Something like this: java version "1.8.0_181" Java(TM) SE Runtime Environment (build 1.8.0_181-b13) – Sheril Dev Nov 12 '18 at 12:34
Can you give the full java path in command like below? echo "This is text tokenization" | /usr/java/jdk1.8.0_20/bin/java -cp C:\Users\Harshit\Downloads\stanford-corenlp-full-2016-10-31\stanford-corenlp-full-2016-10-31\stanford-corenlp-3.7.0.jar\ edu.stanford.nlp.process.PTBTokenizer.class – JavaSat Nov 12 '18 at 16:31
Please provide me the content present in make_datafiles.py file. – JavaSat Nov 12 '18 at 16:34

Stanford-Core-NLP giving Java errors for text tokenization

1 Answers1