0

I'm playing around with drawing bubble charts in R -- the current project is to graph a bubble chart of political donations that has the following characteristics:

x-axis: size of donation, in ranges i.e. $10-$19, $20-29, $30-49, etc.
y-axis: number of donations of that amount
area of bubble: total amount of donations 

I'm not planning anything complex, just something like:

symbols(amount_ranges,amount_occurrences, circles=sums)

The data is pretty granular, so there is a separate entry for each donation and they need to summed in order to get the values I'm looking for.

For example, the data looks like this (extraneous columns removed):

CTRIB_NAML    CTRIB_NAMF    CTRIB_AMT    FILER_ID
John          Smith         $49          123456789

This is not that complex, but is there a simple way in R to count up the number of occurrences of a certain value (for the y-axis)? And to add up sum of those donations (which is derivative of the axes)? Or do I need to create a function that iterates through the data and compiles these numbers separately? Or pre-process the data in someway?

tchaymore
  • 3,728
  • 13
  • 55
  • 86

2 Answers2

3

This is easy when you use the ggplot2 package with geom_point.

One of many benefits of using ggplot is that the built-in statistics means you don't have to pre-summarise your data. geom_point in combination with stat_sum is all you need.

Here is the example from ?geom_point. (Note that mtcars is a built-in dataset with ggplot2.)

See the ggplot website and geom_point for more detail.

library(ggplot2)
ggplot(mtcars, aes(wt, mpg)) + geom_point(aes(size = qsec))

enter image description here

Andrie
  • 176,377
  • 47
  • 447
  • 496
2

You can use ddply from package plyr here. If your original data.frame was called dfr, then something close to this should work:

result<-ddply(dfr, .(CTRIB_AMT), function(partialdfr){data.frame(amt=partialdfr$CTRIB_AMT[1], sm=sum(partialdfr$CTRIB_AMT), mn=mean(partialdfr$CTRIB_AMT)) })

In fact, a base R solution is also rather simple:

vals<-sort(unique(dfr$CTRIB_AMT))
sums<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, sum)
counts<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, length)

I'm sure more elegant solutions exist.

Nick Sabbe
  • 11,684
  • 1
  • 43
  • 57
  • Maybe I should add this as a separate question, but how should I convert the CTRIB_AMT column to a numeric column, so the `sums` and `counts` can calculate properly. I tried `as.numeric(as.character(sub("$",'',contribs$CTRIB_AMT)))` from [here](http://stackoverflow.com/questions/7299991/how-can-i-convert-a-factor-column-that-contains-decimal-numbers-to-numeric) but that didn't work. Any thoughts? – tchaymore Sep 07 '11 at 16:08
  • It should be pretty close. What problems do you get? – Nick Sabbe Sep 07 '11 at 16:33
  • 1
    Oh, I see, you need: `as.numeric(as.character(sub("$","",contribs$CTRIB_AMT, fixed=TRUE)))` because `$` is a special character in regular expressions (see `?sub`) – Nick Sabbe Sep 07 '11 at 16:36
  • Great. That replaced the '$' character but still the column is a factor instead of numeric column. I've got to keep working on this, but if you have any ideas... – tchaymore Sep 07 '11 at 17:00
  • 1
    Sigh. `contribs$CTRIB_AMT<-as.numeric(as.character(sub("$","",contribs$CTRIB_AMT, fixed=TRUE)))`. Hope that works for you. – Nick Sabbe Sep 07 '11 at 21:44