Wordcloud in R using a different feature

Question

Using the description feature from Online retail dataset, I created a word cloud.

descCorpus <- Corpus(VectorSource(without_weird$Description))
descCorpus <- tm_map(descCorpus, removePunctuation)
descCorpus <- tm_map(descCorpus, removeWords, c('the', 'this', 
stopwords('english')))
descCorpus <- tm_map(descCorpus, stemDocument)
wordcloud(descCorpus , max.words = 100, random.order = FALSE)

However, I want the determinant feature of the word cloud to be sales amount instead of frequency. So the higher the sales, the bigger the word.

Reproducible example:

description <- c("36 PENCILS TUBE RED RETROSPOT","HANGING HEART JAR T-LIGHT HOLDER","VICTORIAN SEWING BOX LARGE","CINAMMON SET OF 9 T-LIGHTS","ZINC T-LIGHT HOLDER STARS SMALL","T-LIGHT HOLDER","RABBIT NIGHT LIGHT","WHITE SOAP RACK WITH 2 BOTTLES","BOUDOIR SQUARE TISSUE BOX", "WHITE SKULL HOT WATER BOTTLE","STRAWBERRY CERAMIC TRINKET POT")

sales <-c(4.56,24.96,11.40,15.00,17.85,10.50,20.40,27.04,20.40,15.00,13.00)

df <- data.frame(description, sales)

Where is the information about sales coming in? It's easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so possible solutions can be tested. — MrFlick, Sep 08 '17 at 14:18
Well, you just have to set the `freq` argument as the sales vector, possibly after some transformation (`10^` or `log`) depending on the ranges. then set the `scale` right — agenis, Sep 08 '17 at 14:40

ekstroem · Accepted Answer · 2017-09-08T22:54:49.600

Here's an example using the wonderful wordcloud2 package.

Using your small example data we get

description <- c("36 PENCILS TUBE RED RETROSPOT","HANGING HEART JAR T-LIGHT HOLDER","VICTORIAN SEWING BOX LARGE","CINAMMON SET OF 9 T-LIGHTS","ZINC T-LIGHT HOLDER STARS SMALL","T-LIGHT HOLDER","RABBIT NIGHT LIGHT","WHITE SOAP RACK WITH 2 BOTTLES","BOUDOIR SQUARE TISSUE BOX", "WHITE SKULL HOT WATER BOTTLE","STRAWBERRY CERAMIC TRINKET POT")    
sales <-c(4.56,24.96,11.40,15.00,17.85,10.50,20.40,27.04,20.40,15.00,13.00)    
df <- data.frame(description, sales)

The wordcloud2 function needs the variables to be named word and freq so we do that. The sentences are pretty long so I scale the overall size down with the size argument.

library(dplyr)
library(wordcloud2)
df %>% rename(word=description, freq=sales) %>% wordcloud2(size=.1)

This produces the following (and it's an interactive htmlwidget on top!)

With your original data I get something like this (not exactly sure it was the particular data wrangling you were after, and indata is the read excel-file)

indata %>% group_by(Description) %>% count(Quantity) %>% 
           rename(freq=n, word=Description) %>% 
           wordcloud2(size=1, minSize=3)

which looks like this

Update: And if you want to count words and show them I'd use tidytext:

library(tidytext)
indata %>% unnest_tokens(word, Description, token="words") %>% group_by(word) %>% tally(Quantity) %>% rename(freq=n) %>% ungroup() %>% wordcloud2(minSize=5)

with this result

You'd probably need to jump through the hoops the remove the numbers and stopwords as you already hint at in the OP.

Are you asking to see a solution with words? – ekstroem Sep 08 '17 at 22:51 — ekstroem, Sep 08 '17 at 22:51

Wordcloud in R using a different feature

1 Answers1