In Python, is there a way in any NLP library to combine words to state them as positive?

Question

I have tried looking into this and couldn't find any possible way to do this the way I imagine. The term as an example I am trying to group is 'No complaints', when looking at this word the 'No' is picked up during the stopwords which I have manually removed from the stopwords to ensure it is included in the data. However, both words will be picked during the sentiment analysis as Negative words. I am wanting to combine them together so they can be categorised under either Neutral or Positive. Is it possible to manually group them words or terms together and decide how they are analysed in the sentiment analysis?

I have found a way to group words using POS tagging & Chunking but this combines tags together or Multi-Word Expressionsand doesn't necessarily pick them up correctly in the sentiment analysis.

Current code (using POS Tagging):

from nltk.corpus import stopwords
from nltk.sentiment import SentimentIntensityAnalyzer
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.tokenize import word_tokenize, sent_tokenize, MWETokenizer
import re, gensim, nltk
from gensim.utils import simple_preprocess
import pandas as pd

d = {'text': ['no complaints', 'not bad']}
df = pd.DataFrame(data=d)

stop = stopwords.words('english')
stop.remove('no')
stop.remove('not')
def sent_to_words(sentences):
    for sentence in sentences:
        yield(gensim.utils.simple_preprocess(str(sentence), deacc=True))  # deacc=True removes punctuations
data_words = list(sent_to_words(df))

def remove_stopwords(texts):
    return [[word for word in simple_preprocess(str(doc)) if word not in stop_words] for doc in texts]
data_words_nostops = remove_stopwords(data_words)

txt = df
txt = txt.apply(str)

#pos tag
words = [word_tokenize(i) for i in sent_tokenize(txt['text'])]
pos_tag= [nltk.pos_tag(i) for i in words]

#chunking
tagged_token = nltk.pos_tag(tokenized_text)
grammar = "NP : {<DT>+<NNS>}"
phrases = nltk.RegexpParser(grammar)
result = phrases.parse(tagged_token)
print(result)

sia = SentimentIntensityAnalyzer()
def find_sentiment(post):
    if sia.polarity_scores(post)["compound"] > 0:
        return "Positive"
    elif sia.polarity_scores(post)["compound"] < 0:
        return "Negative"
    else:
        return "Neutral"
    
df['sentiment'] = df['text'].apply(lambda x: find_sentiment(x))

df['compound'] = [sia.polarity_scores(x)['compound'] for x in df['text']]
df

Output:

(S
  0/CD
  (NP no/DT complaints/NNS)
  1/CD
  not/RB
  bad/JJ
  Name/NN
  :/:
  text/NN
  ,/,
  dtype/NN
  :/:
  object/NN)

    |text           |sentiment  |compound
    |:--------------|:----------|:--------
0   |no complaints  |Negative   |-0.5994
1   |not bad        |Positive   | 0.4310

I understand that my current code does not incorporate the POS Tagging & chunking in the sentiment analysis, but you can see the combination of the word 'no complaints' however it's current sentiment and sentiment score is negative (-0.5994), the aim is to use POS tagging and assign the sentiment as positive... somehow if possible!

Chris · Answer 1 · 2023-02-14T12:04:54.367

Option 1

Use VADER sentiment analysis instead, which seems to be handling such idioms better than how nltk does (NLTK incorporates VADER actually, but seems to behave differently in such situations). No need to change anything in your code, except install VADER, as described in the instructions, and then import the library in your code as follows (while removing the one from nltk.sentiment...)

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

Using VADER, you should get the following results. I've added one extra idiom (i.e., "no worries"), which would also be given a negative score if nltk's sentiment was used.

    text            sentiment   compound
0   no complaints   Positive    0.3089
1   not bad         Positive    0.4310
2   no worries      Positive    0.3252

Option 2

Modify NLTK's lexicon, as described here; however, it might not always work (as probably accepts only single words, but not idioms). Example below:

new_words = {
    'no complaints': 3.0
}
sia = SentimentIntensityAnalyzer()
sia.lexicon.update(new_words)

In Python, is there a way in any NLP library to combine words to state them as positive?

1 Answers1

Option 1

Option 2