3

Using the description feature from Online retail dataset, I created a word cloud.

descCorpus <- Corpus(VectorSource(without_weird$Description))
descCorpus <- tm_map(descCorpus, removePunctuation)
descCorpus <- tm_map(descCorpus, removeWords, c('the', 'this', 
stopwords('english')))
descCorpus <- tm_map(descCorpus, stemDocument)
wordcloud(descCorpus , max.words = 100, random.order = FALSE)

However, I want the determinant feature of the word cloud to be sales amount instead of frequency. So the higher the sales, the bigger the word.

Reproducible example:

description <- c("36 PENCILS TUBE RED RETROSPOT","HANGING HEART JAR T-LIGHT HOLDER","VICTORIAN SEWING BOX LARGE","CINAMMON SET OF 9 T-LIGHTS","ZINC T-LIGHT HOLDER STARS SMALL","T-LIGHT HOLDER","RABBIT NIGHT LIGHT","WHITE SOAP RACK WITH 2 BOTTLES","BOUDOIR SQUARE TISSUE BOX", "WHITE SKULL HOT WATER BOTTLE","STRAWBERRY CERAMIC TRINKET POT")

sales <-c(4.56,24.96,11.40,15.00,17.85,10.50,20.40,27.04,20.40,15.00,13.00)

df <- data.frame(description, sales)
dank
  • 303
  • 4
  • 20
  • 1
    Where is the information about sales coming in? It's easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so possible solutions can be tested. – MrFlick Sep 08 '17 at 14:18
  • Well, you just have to set the `freq` argument as the sales vector, possibly after some transformation (`10^` or `log`) depending on the ranges. then set the `scale` right – agenis Sep 08 '17 at 14:40
  • @MrFlick I just added a reproducible example – dank Sep 08 '17 at 17:52
  • @agenis does that take into account the word or strings? – dank Sep 08 '17 at 17:59

1 Answers1

3

Here's an example using the wonderful wordcloud2 package.

Using your small example data we get

description <- c("36 PENCILS TUBE RED RETROSPOT","HANGING HEART JAR T-LIGHT HOLDER","VICTORIAN SEWING BOX LARGE","CINAMMON SET OF 9 T-LIGHTS","ZINC T-LIGHT HOLDER STARS SMALL","T-LIGHT HOLDER","RABBIT NIGHT LIGHT","WHITE SOAP RACK WITH 2 BOTTLES","BOUDOIR SQUARE TISSUE BOX", "WHITE SKULL HOT WATER BOTTLE","STRAWBERRY CERAMIC TRINKET POT")    
sales <-c(4.56,24.96,11.40,15.00,17.85,10.50,20.40,27.04,20.40,15.00,13.00)    
df <- data.frame(description, sales)

The wordcloud2 function needs the variables to be named word and freq so we do that. The sentences are pretty long so I scale the overall size down with the size argument.

library(dplyr)
library(wordcloud2)
df %>% rename(word=description, freq=sales) %>% wordcloud2(size=.1)

This produces the following (and it's an interactive htmlwidget on top!)

enter image description here

With your original data I get something like this (not exactly sure it was the particular data wrangling you were after, and indata is the read excel-file)

indata %>% group_by(Description) %>% count(Quantity) %>% 
           rename(freq=n, word=Description) %>% 
           wordcloud2(size=1, minSize=3)

which looks like this

enter image description here

Update: And if you want to count words and show them I'd use tidytext:

library(tidytext)
indata %>% unnest_tokens(word, Description, token="words") %>% group_by(word) %>% tally(Quantity) %>% rename(freq=n) %>% ungroup() %>% wordcloud2(minSize=5)

with this result

enter image description here

You'd probably need to jump through the hoops the remove the numbers and stopwords as you already hint at in the OP.

ekstroem
  • 5,957
  • 3
  • 22
  • 48