1

edit "please focus the answer only for example below, no broad scenarios"

Ok. I have read about word cloud. But I was wondering how can I represent the words occuring most frequently together in a string variable as in example below?:

Var_x
wireless problems, migration to competitor
dissatisfied customers, technicians visits scheduled
call waiting, technicians visits
bad customer experience, wireless problems

So What I want is: ("wireless problems" and "technicians visits") representation in the cloud. How can this be done?

muni
  • 1,263
  • 4
  • 22
  • 31
  • Make an [ngram](http://stackoverflow.com/a/26655378/4667934) using one of the various libraries or Counter – LinkBerest Sep 07 '16 at 11:30
  • why the downvote? – muni Sep 07 '16 at 14:49
  • your question is broad so I assume that is why - I did not downvote it - or because if you look at the sklearn option I linked to and then look up the documentation you'll see you can set the 'N' of the ngram. Meaning set it to bigrams in your case – LinkBerest Sep 07 '16 at 14:58

1 Answers1

4

This code produces a frequency distribution of adjacent words that can be used as the underlying word cloud data:

from nltk import bigrams, FreqDist
from nltk.tokenize import RegexpTokenizer
from operator import itemgetter

sent = 'wireless problems, migration to competitor\n\
dissatisfied customers, technicians visits scheduled\n\
call waiting, technicians visits\n\
bad customer experience, wireless problems'

tokenizer = RegexpTokenizer(r'\w+')
sent_words = tokenizer.tokenize(sent)
freq_dist = FreqDist(bigrams(sent_words))

for k,v in sorted(freq_dist.items(), key=itemgetter(1), reverse=True):
    print(k,v)

Output

('technicians', 'visits') 2
('wireless', 'problems') 2
('dissatisfied', 'customers') 1
('bad', 'customer') 1
('scheduled', 'call') 1
('competitor', 'dissatisfied') 1
('migration', 'to') 1
('to', 'competitor') 1
('visits', 'scheduled') 1
('call', 'waiting') 1
('problems', 'migration') 1
('waiting', 'technicians') 1
('customers', 'technicians') 1
('customer', 'experience') 1
('experience', 'wireless') 1
('visits', 'bad') 1
Craig Burgler
  • 1,749
  • 10
  • 19
  • thanks, how can I plot such data? – muni Sep 07 '16 at 13:51
  • Have a look at this blog post from four years ago: http://peekaboo-vision.blogspot.com/2012/11/a-wordcloud-in-python.html. The corresponding updated code is on github here: https://github.com/amueller/word_cloud. This library is by far the most popular python word cloud library on github but there are others if it doesn't suit your needs. – Craig Burgler Sep 07 '16 at 14:11
  • Also check out this post for some pretty amazing word clouds made with `word_cloud`: http://minimaxir.com/2016/05/wordclouds/ – Craig Burgler Sep 07 '16 at 14:18
  • thanks, I already ran through it. But it generates single word cloud, I want the cloud based on the output you created for 2 words frequency – muni Sep 07 '16 at 14:22
  • Can you join each two word tuple into single words and and run those through `word_cloud`? – Craig Burgler Sep 07 '16 at 14:34
  • Ok, I am able to combine the words, but how do i give the dataframe column input to wordcloud? – muni Sep 07 '16 at 15:30