-1

I have a dataframe which has reviews and polarity as below. I have just taken 2 samples here but similar to that I have more than 1000 reviews and polarity

 Reviews              Polarity
This movie is good   Positive
This is bad          negative

I have a function written named find_features on which I need to pass all the reviews from this dataframe, do some manipulations and obtain them as a list in featuresets. I am trying to use the below technique to loop through the review columns of df and similarly for those column I should get the value of polarity assigned in featuresets

featuresets = [(find_features(df.reviews), df.polarity) for (df.reviews, df.polarity) in df]

Find_features function:

 def find_features(document):
 words = word_tokenize(document)
 features = {}
 for w in word_features:
     features[w] = (w in words)
 return features

By calling this function, all the words in my reviews column will be split as a result of tokenize function in find_feature and will be assigned a polarity (positive or negative). I have certain list of words generated which I compared with most frequent words and word_feature has top most frequently used words.

all_words = nltk.FreqDist(all_words)
word_features = list(all_words.keys())

 good    -  positive
 bad     -  negative

while writing featuresets function, I am getting the below error:

ValueError: too many values to unpack (expected 2)

I know the above logic works for any kind of list or dictionary, but I wanted to use similar kind of logic for Dataframe. Could you please help me on this?

Sriram Chandramouli
  • 191
  • 1
  • 1
  • 10
  • 1
    I think what you want is: `featuresets = [(find_features(item.reviews), item.polarity) for item in df]` – Andrea Corbellini Apr 17 '16 at 16:57
  • 1
    I tried doing that. But getting another error: AttributeError: 'str' object has no attribute 'reviews' – Sriram Chandramouli Apr 17 '16 at 17:08
  • What does `find_features` do? Does it take one item from Reviews and outputs some other `str` object? – quapka Apr 17 '16 at 17:10
  • I have find_features function in my question. I was updating in the mean time. – Sriram Chandramouli Apr 17 '16 at 17:14
  • Thanks, although, I think I still don't quite get it. My answer is wrong. – quapka Apr 17 '16 at 17:17
  • What is `df`? It looks like it's an iterable containing strings – Andrea Corbellini Apr 17 '16 at 17:25
  • yup, it contains string values – Sriram Chandramouli Apr 17 '16 at 17:27
  • Possible duplicate of ["Too many values to unpack" Exception](http://stackoverflow.com/questions/1479776/too-many-values-to-unpack-exception) – Cwt Apr 17 '16 at 17:30
  • @quapka - find_features function takes each reviews from reviews column of DF and it splits the whole sentence into words by using Word tokenizer. After that, it will compare these words with the most frequent list of words which are present in word_features and if its true, it will return those words. – Sriram Chandramouli Apr 17 '16 at 17:30
  • @sevenforce I am getting unpack error because I am going wrong while trying to use similar logic for Dataframe. It would be great if I get some idea on how to do it for a column of dataframe which has more values – Sriram Chandramouli Apr 17 '16 at 17:33
  • @SriramChandramouli, the sequence of the code snippets you've placed in the question are bit confusing; could you please place them in proper sequence; also pasting relevant error stack might help in better understanding of the error. – Joshua Baboo Apr 17 '16 at 19:21
  • did you try: `df.apply(func_call, axis=1)` approach to check if row-wise processing of the dataframe helps your requirement? – Joshua Baboo Apr 17 '16 at 19:38
  • its a duplicate entry of http://stackoverflow.com/questions/36672475/python-getting-typeerror-expected-string-or-bytes-like-object-while-calling-a?noredirect=1#comment60953626_36672475 ? – Joshua Baboo Apr 17 '16 at 21:27

1 Answers1

0

though the sequencing of the given code snippets are not very intuitive, I notice 2 things that are unusual in the given snippet:

  • for (df.reviews, df.polarity) in df: the usual way is for col_name in df, that iterates over the available column names in df.
  • when find_features is supposed to return a dict you are trying to place that result into a tuple in the expression (find_features(df.reviews), df.polarity)
Joshua Baboo
  • 525
  • 1
  • 4
  • 17