2

I have a dataframe containing reviews of a particular product with the columns month and review. I want perform a type of text analysis on the review column, whereby you can query the for a particular keyword and it will return a list of modifiers for that keyword.

df = pd.DataFrame({'month': ['Jan', 'Feb', 'Mar', 'Apr', 'Apr'],
                   'review': ['there should be 'share' button on each item. right now when my wife wants me to buy her something, she has to dictate the item id which is horrendous.', 'always nice but high prices', 'this app currently needs more than 3 gigs of space on my phone. that is ridiculous. guess it has to go. /edit cool, trying again, thanks for the answer.', 'impossible to login in the app, is there any way to get the barcode of the card? if i click the link in the email for the card print thingy it just shows a broken image.', 'i cannot change my location and language preference'],
                   'sentiment': ["positive", "negative", "positive", "negative", "neutral"]})

For example, say the reviews dataset is from the hospitality industry, and performed sentiment analysis. Upon checking the most frequent words in the positive and negative reviews, you got this.

Positive: hotel, location, staff, view, room, breakfast

Negative: hotel, staff, room, breakfast, window, bed, Wi-Fi

You wanted to go deeper into the analysis and uncover exactly what it was about these objects that were – or were not – working as expected by customers. For example, why were windows such a prominent aspect of negative reviews?

So, you set out to create a syntactic dependency tree, which connects all terms in the input text according to their syntactic relation. Then, you queried this tree to pinpoint precisely what it was about a given keyword (for example, "room" or "location") that customers did or did not especially like (this is where I need help, I don't know how to implement this in code)

I want a resulting list of modifiers so I can create word clouds to visualize the frequency of each modifier for the given keyword, such as the word cloud below, for the keyword "room":

enter image description here

Honestly I don't even know where to start, I'm currently working with spaCy's dependency parsing to see how it works and what it returns. So while I do that, I also seek help from here.

hatice
  • 25
  • 3
  • Have a look at the following resources which should help put you on the right track: the [spaCy Dependency Parser](https://spacy.io/usage/linguistic-features#dependency-parse) documentation for navigating the parse tree; [list of dependency labels](https://github.com/clir/clearnlp-guidelines/blob/master/md/specifications/dependency_labels.md) to understand what sort of labels you should be looking for; and [this question and its answers](https://stackoverflow.com/questions/36610179/how-to-get-the-dependency-tree-with-spacy) for ways to visualise the parse tree to make analysis easier – Kyle F Hartzenberg Jan 08 '23 at 01:55
  • It's not easy because you'll have to deal with many syntactic cases: 'the wifi was slow', 'the slow wifi', 'the wifi was not fast', 'the wifi was fine in my opinion, but my husband thought it was too slow', etc. It depends how precisely you want this to be done, but in general it's quite complex. Fyi I wouldn't use the word 'generate' because this has a different sense in NLP, I think you mean 'extracting modifiers'. – Erwan Jan 08 '23 at 17:09

0 Answers0