18

I deal a bit with textual data across various grouping variables. I'm thinking of creating a method to make faceted wordcloud plots using Ian Fellows' wordcloud package. I like the way ggplot2 facets social variables. I'm deciding how to approach this problem (faceted wordcloud plot).

Is it possible to use Fellows' work as a geom (I've never made a geom but may learn if this is doable) or will ggplot not play nicely because one is grid and one is base (and wordcloud also uses some C coding) or some other problem? How difficult is this (I know this is dependent on my abilities but would like some ball park answer)? Please advise if using base graphics may be the more sensible approach to this problem. I foresee this may be approached using panes from the plotrix package to give it the aesthetic feel that ggplot's faceting gives.

Maybe this is a foolish concept considering the size of word clouds and the way faceting quickly limits the available space.

Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • You may find this useful: http://stackoverflow.com/questions/7029906/extending-ggplot2-properly – Ari B. Friedman Jun 28 '12 at 22:55
  • 3
    You could modify the `wordcloud` function to (invisibly) return the position, orientation, size and colour of the words and then use that with `geom_text`. For a cleaner solution, you would probably need to wrap the call to `wordcloud` in a `stat_wordcloud` function. – Vincent Zoonekynd Jun 29 '12 at 00:08
  • 3
    Except that ggplot2 uses grid for plotting and the word sizes are calculated using base graphics. – Ian Fellows Jun 29 '12 at 00:52
  • 1
    You might find something useful in Jesse Bridgewater's blogs: [here](http://bridgewater.wordpress.com/2012/04/16/word-cloud-alternatives/) and [here](http://bridgewater.wordpress.com/2012/04/18/a-word-cloud-where-the-x-and-y-axes-mean-something/) – Sandy Muspratt Jun 29 '12 at 02:07
  • ggplot2 tries to avoid allowing users to easily create bad visualisations. I wonder if having an easily accessible word-cloud creator is actually a good thing? I mean, in print, word clouds don't really offer much. On the 'web, they're mostly useful as a way of exploring tags, so ggplot2's static output (without links) would be pretty useless in anything other than an aesthetic sense. – naught101 Oct 15 '12 at 02:27
  • 1
    I would disagree with your analysis of word clouds. They present a ton of information without much loss. By sizing words proportionally you can compare across clouds. Additionally you can use colors to represent, themes. Just because word clouds have thus far been utilized in rudimentary ways should not preclude them from serious analysis. For a demonstration of improved word cloud use see: http://trinkerrstuff.wordpress.com/2012/10/04/presidential-debates-with-qdap-beta/ ggplot2 is not about bad plots but it about good investigation of data... – Tyler Rinker Oct 15 '12 at 02:56
  • ...For discourse analysis and text mining a word cloud can be a very useful text exploration tool and I believe if were properly developed a useful information display tool as well. – Tyler Rinker Oct 15 '12 at 02:57
  • It is now time to dream: https://github.com/lepennec/ggwordcloud – Tyler Rinker Oct 31 '18 at 21:14

2 Answers2

8

This may be a pipe dream, and it certainly isn't easy to re-use the wordcloud code:

  1. As Ian Fellows points out in a comment, the wordcloud code calculates word sizes and positions in base graphics.
  2. A geom-aware modification of the code needs to be aware of facets.

In terms of making it work, a framework for designing a solution might be:

  1. Rewrite wordcloud to calculate word sizes in grid graphics, rather than base graphics
  2. Write the results of word size and position to a data frame
  3. Wrap the calculations in a function called stat_wordcloud
  4. Modify geom_text to a new geom_wordcloud

So, it's a pipe dream, but I'd be keen to use it once you've made it ;-)

Andrie
  • 176,377
  • 47
  • 447
  • 496
2

This is a possible solution using ggplot2 style: https://github.com/lepennec/ggwordcloud

Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519