I have a dataframe which has reviews and polarity as below. I have just taken 2 samples here but similar to that I have more than 1000 reviews and polarity
Reviews Polarity
This movie is good Positive
This is bad negative
I have a function written named find_features on which I need to pass all the reviews from this dataframe, do some manipulations and obtain them as a list in featuresets. I am trying to use the below technique to loop through the review columns of df and similarly for those column I should get the value of polarity assigned in featuresets
featuresets = [(find_features(df.reviews), df.polarity) for (df.reviews, df.polarity) in df]
Find_features function:
def find_features(document):
words = word_tokenize(document)
features = {}
for w in word_features:
features[w] = (w in words)
return features
By calling this function, all the words in my reviews column will be split as a result of tokenize function in find_feature and will be assigned a polarity (positive or negative). I have certain list of words generated which I compared with most frequent words and word_feature has top most frequently used words.
all_words = nltk.FreqDist(all_words)
word_features = list(all_words.keys())
good - positive
bad - negative
while writing featuresets function, I am getting the below error:
ValueError: too many values to unpack (expected 2)
I know the above logic works for any kind of list or dictionary, but I wanted to use similar kind of logic for Dataframe. Could you please help me on this?