2

I have a dataframe which contains survey answers. Three of those columns are open-ended answers. Using HuggingFace NLP I'm using a pre-trained sentiment analysis classifier. Please find the code below:

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
classifier("This community is so helpful!")

The results for the classifier test is: "[{'label': '5 stars', 'score': 0.800311}]

What I'd like to do is have the classifier run on my open-ended responses and, in new columns in my dataframe, have it include the stars and ranking score.

Any help would be greatly appreciated.

edit: I uploaded the dataset through a local csv. The dataframe column name I want to work with is "Q72"

CIHAnalytics
  • 143
  • 1
  • 10
  • this is too much information. what you are searching for is how to apply a function on column and create new columns with results. NLP does not add anything useful to the question. once you ask the question this way there are a lot of existing answers, for example https://stackoverflow.com/questions/16236684/apply-pandas-function-to-column-to-create-multiple-new-columns – sleepyhead Oct 13 '20 at 20:25
  • 2
    Does this answer your question? [Apply pandas function to column to create multiple new columns?](https://stackoverflow.com/questions/16236684/apply-pandas-function-to-column-to-create-multiple-new-columns) – sleepyhead Oct 13 '20 at 20:27

1 Answers1

3

Apply model on a column and create another column using assign function:

df = (
    df
    .assign(sentiment = lambda x: x['Q72'].apply(lambda s: classifier(s)))
    .assign(
         label = lambda x: x['sentiment'].apply(lambda s: (s[0]['label'])),
         score = lambda x: x['sentiment'].apply(lambda s: (s[0]['score']))
    )
)
Mehdi Golzadeh
  • 2,594
  • 1
  • 16
  • 28
  • This is brilliant. Thank you so much! Is there a way to separate out the results (i.e. first result column is stars, next result column is score)? Or do you think this would be an additional step like breaking out the text values into different columns? – CIHAnalytics Oct 13 '20 at 20:49
  • @CIHAnalytics updated the answer and now you have that 2 fields in the separate columns – Mehdi Golzadeh Oct 13 '20 at 21:06
  • I'm getting an error that says "string indices must be integers". How can I resolve that? – CIHAnalytics Oct 13 '20 at 21:11
  • Can you update the question and let me know how you load dataset and. what are the columns you have and how you apply the thing I explained? so that I can help you. – Mehdi Golzadeh Oct 13 '20 at 21:13
  • I've updated the question. I uploaded the dataset through a local csv. The column I'm applying your answer to is "Q72". So I've replaced x['sentiment'] in your answer with x['Q72']. The answers in that column are test answers so they only include "Love it", "Meh", and "Hate it". The type error points to the label bit. – CIHAnalytics Oct 13 '20 at 21:18
  • That did it! Thanks so much @MhDG7! You're a rockstar. I've been trying to figure this out all day. – CIHAnalytics Oct 13 '20 at 21:25
  • Your welcome friend ;) please hit the accept answer – Mehdi Golzadeh Oct 13 '20 at 21:26
  • Done. I'm impressed by your skill. I wish you the best and thank you for making the time for me. It is very appreciated. Be well. – CIHAnalytics Oct 14 '20 at 14:35
  • This solution works, but I have to note that it is very slow for me at least (it doesn't allow for batching, and I'm not sure if processing is done all on the GPU) – GrimSqueaker Dec 06 '21 at 12:28