So I'm trying to run to tokenize the text using StanfordCore NLP for text summarization using this git repo. I have set the environment variables for Java-8 and I'm using python 2.7. When I run this command :
echo "This is text tokenization" | java -cp C:\Users\Harshit\Downloads\stanford-corenlp-full-2016-10-31\stanford-corenlp-full-2016-10-31\stanford-corenlp-3.7.0.jar\ edu.stanford.nlp.process.PTBTokenizer.class
It works fine and gives output as :
"This
is
text
tokenization"
But when I'm using the command:
python make_datafiles.py /path/to/cnn/stories /path/to/dailymail/stories.
I get this error :
'"java -cp"' is not recognized as an internal or external command,
operable program or batch file.
Exception: The tokenized stories directory cnn_stories_tokenized contains 0 files, but it should contain the same number as C:\Users\Harshit\Downloads\cnn_stories_tokenized\cnn_stories_tokenized (which has 92579 files). Was there an error during tokenization?
How do I solve this and tokenize the datafiles ?