All I want to do is find the sentiment (positive/negative/neutral) of any given string. On researching I came across Stanford NLP. But sadly its in Java. Any ideas on how can I make it work for python?
-
1Looks like dasmith on GitHub wrote a nice little wrapper for this: https://github.com/dasmith/stanford-corenlp-python – devmacrile Oct 01 '15 at 04:40
-
NLTK contains a wrapper for Stanford NLP, though I'm not sure if it includes sentiment analysis. Calling an external utility - in Java or whatever - from Python is not hard. – tripleee Mar 07 '16 at 08:01
10 Answers
Use py-corenlp
Download Stanford CoreNLP
The latest version at this time (2020-05-25) is 4.0.0:
wget https://nlp.stanford.edu/software/stanford-corenlp-4.0.0.zip https://nlp.stanford.edu/software/stanford-corenlp-4.0.0-models-english.jar
If you do not have wget
, you probably have curl
:
curl https://nlp.stanford.edu/software/stanford-corenlp-4.0.0.zip -O https://nlp.stanford.edu/software/stanford-corenlp-4.0.0-models-english.jar -O
If all else fails, use the browser ;-)
Install the package
unzip stanford-corenlp-4.0.0.zip
mv stanford-corenlp-4.0.0-models-english.jar stanford-corenlp-4.0.0
Start the server
cd stanford-corenlp-4.0.0
java -mx5g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -timeout 10000
Notes:
timeout
is in milliseconds, I set it to 10 sec above. You should increase it if you pass huge blobs to the server.- There are more options, you can list them with
--help
. -mx5g
should allocate enough memory, but YMMV and you may need to modify the option if your box is underpowered.
Install the python package
The standard package
pip install pycorenlp
does not work with Python 3.9, so you need to do
pip install git+https://github.com/sam-s/py-corenlp.git
(See also the official list).
Use it
from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
res = nlp.annotate("I love you. I hate him. You are nice. He is dumb",
properties={
'annotators': 'sentiment',
'outputFormat': 'json',
'timeout': 1000,
})
for s in res["sentences"]:
print("%d: '%s': %s %s" % (
s["index"],
" ".join([t["word"] for t in s["tokens"]]),
s["sentimentValue"], s["sentiment"]))
and you will get:
0: 'I love you .': 3 Positive
1: 'I hate him .': 1 Negative
2: 'You are nice .': 3 Positive
3: 'He is dumb': 1 Negative
Notes
- You pass the whole text to the server and it splits it into sentences. It also splits sentences into tokens.
- The sentiment is ascribed to each sentence, not the whole text. The mean
sentimentValue
across sentences can be used to estimate the sentiment of the whole text. - The average sentiment of a sentence is between
Neutral
(2) andNegative
(1), the range is fromVeryNegative
(0) toVeryPositive
(4) which appear to be quite rare. - You can stop the server either by typing Ctrl-C at the terminal you started it from or using the shell command
kill $(lsof -ti tcp:9000)
.9000
is the default port, you can change it using the-port
option when starting the server. - Increase
timeout
(in milliseconds) in server or client if you get timeout errors. sentiment
is just one annotator, there are many more, and you can request several, separating them by comma:'annotators': 'sentiment,lemma'
.- Beware that the sentiment model is somewhat idiosyncratic (e.g., the result is different depending on whether you mention David or Bill).
PS. I cannot believe that I added a 9th answer, but, I guess, I had to, since none of the existing answers helped me (some of the 8 previous answers have now been deleted, some others have been converted to comments).

- 58,617
- 29
- 161
- 278
-
1Thanks for your answer! I think it is the only one that is promising. But I wonder is there any other way to pass the sentences. Suppose I have a large .txt file with more than 10,000 lines and each line per sentence. What is the appropriate way for me to use? Thanks! – user5779223 Nov 28 '16 at 09:00
-
if you will find that you cannot pass all 10k lines in a single blob, you can split it arbitrarily (note that your sentence "each line per sentence" is unclear). – sds Nov 28 '16 at 13:49
-
-
Inside `for s in res["sentences"]`, Is there a way to print this beautifully like http://nlp.stanford.edu:8080/sentiment/rntnDemo.html ? – ThinkGeek Sep 09 '17 at 10:05
-
@LokeshAgrawal: not OOTB. The information _is_ contained in the response, but you need a library for visual representation. – sds Sep 10 '17 at 13:21
-
-
@NickTheInventor: there is nothing platform-specific here. Both Java and Python run identically on Unix and Windows. – sds Feb 21 '18 at 17:13
-
For those who experience an error `Error occurred during initialization of VM Could not reserve enough space for 2097152KB object heap` on running CoreNLP Server, use `java -cp "*" -Xmx1500m edu.stanford.nlp.pipeline.StanfordCoreNLPServer -timeout 15000` instead, limiting Java to run at 1.5GB memory only. – Lëmön Apr 12 '18 at 09:12
-
instead `#java -mx4g -cp "*;stanford-corenlp-full-2017-06-09/*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000` works. else it gives error – StatguyUser Apr 18 '18 at 15:32
-
1Hi, as of 2020 Stanford NLP provides a Stanford CoreNLP Client for Stanza. It is called Stanford CoreNLP Client and the documentation can be found here: https://stanfordnlp.github.io/stanza/corenlp_client.html – gneusch Oct 15 '20 at 13:31
Native Python implementation of NLP tools from Stanford
Recently Stanford has released a new Python packaged implementing neural network (NN) based algorithms for the most important NLP tasks:
- tokenization
- multi-word token (MWT) expansion
- lemmatization
- part-of-speech (POS) and morphological features tagging
- dependency parsing
It is implemented in Python and uses PyTorch as the NN library. The package contains accurate models for more than 50 languages.
To install you can use PIP:
pip install stanfordnlp
To perform basic tasks you can use native Python interface with many NLP algorithms:
import stanfordnlp
stanfordnlp.download('en') # This downloads the English models for the neural pipeline
nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
doc.sentences[0].print_dependencies()
EDIT:
So far, the library does not support sentiment analysis, yet I'm not deleting the answer, since it directly answers the "Stanford nlp for python" part of the question.

- 1,675
- 10
- 14
-
3Thank you for your post. I was trying to do something similar(Analyze sentiments on statements). After reading your post I came to know that stanfordnlp for python is not yet supporting sentiments. – Ganesh M S Apr 20 '19 at 20:13
Right now they have STANZA.
https://stanfordnlp.github.io/stanza/
Release History Note that prior to version 1.0.0, the Stanza library was named as “StanfordNLP”. To install historical versions prior to to v1.0.0, you’ll need to run pip install stanfordnlp.
So, it confirms that Stanza is the full python version of stanford NLP.

- 406
- 5
- 6
-
As of 2020 this is the best answer to this question, as Stanza is native python, so no need to run the Java package. Available through pip or conda. – gneusch Oct 11 '20 at 11:10
-
Textblob
is a great package for sentimental analysis written in Python
. You can have the docs here . Sentimental analysis of any given sentence is carried out by inspecting words and their corresponding emotional score (sentiment). You can start with
$ pip install -U textblob
$ python -m textblob.download_corpora
First pip install command will give you latest version of textblob installed in your (virtualenv
) system since you pass -U will upgrade the pip package its latest available version
. And the next will download all the data required, thecorpus
.

- 2,148
- 3
- 25
- 45
-
1I actually tried using Textblob but the sentiment scores are pretty off. Hence I was planning to switch to stanford nlp instead – 90abyss Oct 01 '15 at 18:28
-
-
1
I also faced similar situation. Most of my projects are in Python and sentiment part is Java. Luckily it's quite easy to lean how to use the stanford CoreNLP jar.
Here is one of my scripts and you can download jars and run it.
import java.util.List;
import java.util.Properties;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations.SentimentAnnotatedTree;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.util.ArrayCoreMap;
import edu.stanford.nlp.util.CoreMap;
public class Simple_NLP {
static StanfordCoreNLP pipeline;
public static void init() {
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, parse, sentiment");
pipeline = new StanfordCoreNLP(props);
}
public static String findSentiment(String tweet) {
String SentiReturn = "";
String[] SentiClass ={"very negative", "negative", "neutral", "positive", "very positive"};
//Sentiment is an integer, ranging from 0 to 4.
//0 is very negative, 1 negative, 2 neutral, 3 positive and 4 very positive.
int sentiment = 2;
if (tweet != null && tweet.length() > 0) {
Annotation annotation = pipeline.process(tweet);
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
if (sentences != null && sentences.size() > 0) {
ArrayCoreMap sentence = (ArrayCoreMap) sentences.get(0);
Tree tree = sentence.get(SentimentAnnotatedTree.class);
sentiment = RNNCoreAnnotations.getPredictedClass(tree);
SentiReturn = SentiClass[sentiment];
}
}
return SentiReturn;
}
}

- 176
- 1
- 5
I am facing the same problem : maybe a solution with stanford_corenlp_py that uses Py4j
as pointed out by @roopalgarg.
stanford_corenlp_py
This repo provides a Python interface for calling the "sentiment" and "entitymentions" annotators of Stanford's CoreNLP Java package, current as of v. 3.5.1. It uses py4j to interact with the JVM; as such, in order to run a script like scripts/runGateway.py, you must first compile and run the Java classes creating the JVM gateway.

- 3,293
- 1
- 10
- 27
Use stanfordcore-nlp python library
stanford-corenlp is a really good wrapper on top of the stanfordcore-nlp to use it in python.
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip
Usage
# Simple usage
from stanfordcorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('/Users/name/stanford-corenlp-full-2018-10-05')
sentence = 'Guangdong University of Foreign Studies is located in Guangzhou.'
print('Tokenize:', nlp.word_tokenize(sentence))
print('Part of Speech:', nlp.pos_tag(sentence))
print('Named Entities:', nlp.ner(sentence))
print('Constituency Parsing:', nlp.parse(sentence))
print('Dependency Parsing:', nlp.dependency_parse(sentence))
nlp.close() # Do not forget to close! The backend server will consume a lot memory.

- 1
- 1

- 1,752
- 15
- 14
-
Can you please explain how this stanfordcorenlp can be used to analyze the sentiment of the statement? – Ganesh M S Apr 20 '19 at 20:33
I would suggest using the TextBlob library. A sample implementation goes like this:
from textblob import TextBlob
def sentiment(message):
# create TextBlob object of passed tweet text
analysis = TextBlob(message)
# set sentiment
return (analysis.sentiment.polarity)

- 1
- 1
There is a very new progress on this issue:
Now you can use stanfordnlp
package inside the python:
From the README:
>>> import stanfordnlp
>>> stanfordnlp.download('en') # This downloads the English models for the neural pipeline
>>> nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
>>> doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
>>> doc.sentences[0].print_dependencies()

- 1,752
- 1
- 18
- 41
import os
import numpy as np
import pandas as pd
inputFile = 'senti_post3.csv'
# Add empty column columns
df = pd.read_csv(inputFile)
df.head(5)
# header_list_new = ['numSentence', 'numWords', 'totSentiment', 'avgSentiment', 'Sfreq0','Sfreq1','Sfreq2','Sfreq3','Sfreq4','Sfreq5']
# for i, name in enumerate(header_list_new):
# df[name] = 0
from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
# Function; Output = # sentence, # words, avg.sentimentValue, sentimentHist
def stanford_sentiment(text_str):
res = nlp.annotate(text_str,
properties={
'annotators': 'sentiment',
'outputFormat': 'json',
'timeout': 40000,
})
numSentence = len(res["sentences"])
numWords = len(text_str.split())
# data arrangement
arraySentVal = np.zeros(numSentence)
for i, s in enumerate(res["sentences"]):
arraySentVal[i] = int(s["sentimentValue"])
# sum of sentiment values
totSentiment = sum(arraySentVal)
# avg. of sentiment values
avgSentiment = np.mean(arraySentVal)
# frequency of sentimentValue
bins = [0,1,2,3,4,5,6]
freq = np.histogram(arraySentVal, bins)[0] # getting freq. only w/o bins
return(numSentence, numWords, totSentiment, avgSentiment, freq)
# dfLength = len(df)
# for i in range(dfLength):
for i in range(54000,55284):
try:
numSentence, numWords, totSentiment, avgSentiment, freq = stanford_sentiment(df.clean_text[i].replace('\n'," "))
df.loc[i,'numSentence'] = numSentence
df.loc[i,'numWords'] = numWords
df.loc[i,'totSentiment'] = totSentiment
df.loc[i,'avgSentiment'] = avgSentiment
df.loc[i,'Sfreq0'] = freq[0]
df.loc[i,'Sfreq1'] = freq[1]
df.loc[i,'Sfreq2'] = freq[2]
df.loc[i,'Sfreq3'] = freq[3]
df.loc[i,'Sfreq4'] = freq[4]
df.loc[i,'Sfreq5'] = freq[5]
except:
print("error where i =", i)
outputFile = 'senti_post16.csv'
df.to_csv(outputFile, encoding='utf-8', index=False )

- 1
- 3
-
useful link: https://medium.com/analytics-vidhya/sentiment-feature-extraction-using-stanford-corenlp-python-jupyter-notebook-29a0d97ca76f – joy google Oct 15 '22 at 05:15