0

I'm analyzing a batch of data in R which I have plotted the population density on. I would also like to generate a value density plot. For example:

      dog.breed    weight.lbs
[1]   Labrador     63
[2]   Maltese      6
[3]   Dalmatian    55
[4]   Poodle       51
[5]   Maltese      4
[6]   Dalmatian    48
[7]   Poodle       56

The standard density plot will count the # of occurrences for each breed and then output a nice curve, as such:

      dog.breed    x
[1]   Labrador     1
[2]   Maltese      2
[3]   Dalmatian    2
[4]   Poodle       2

However what I am trying to obtain is a similarly smooth curve tracing the sum of the weights for each breed, as such:

      dog.breed    x
[1]   Labrador     63
[2]   Maltese      10
[3]   Dalmatian    103
[4]   Poodle       107

I can do this by establishing a series of points, such as in the final example, and then fitting a curve. But that's messy. I was hoping someone knew of clean package that could do the heavy lifting.

Thanks for the help.

Some Clarification:

How about another example. Suppose I have 50 stores and for every patron I know and how much they spend each time they come to the store. A density plot of the patron population on the stores would reveal information about how many people are attending each store. I'm looking for the equivalent plot, but for how much all people are spending at each store. Meh?

mindless.panda
  • 4,014
  • 4
  • 35
  • 57
dittle
  • 41
  • 1
  • 7
  • 1
    I can't tell what you're trying to do at all. Maybe you could provide a complete, reproducible example that demonstrates what you've done so far? – joran Jul 25 '12 at 21:44
  • Do you want to sum `weight.lbs` by unique `dog.breed` versus getting the frequency distribution of `dog.breed`? – mindless.panda Jul 25 '12 at 22:03
  • You should always try to distill your question down to the main essence, e.g. "How to sum one column based on unique values in another column". In this case, the details about population and weights aren't quite as important. You should also try to make your question reproducible. Check [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for more info on asking great R questions (which tend to get great answers!) – mindless.panda Jul 25 '12 at 22:39
  • I'm good with that. Just use aggregate(). I'm wondering if there is a standard function for plotting this information that is similar to density(), calculating the outputs as described above rather than yielding a frequency distribution. But yes, I'll modify the question if I don't get some answers soon. Thanks. – dittle Jul 25 '12 at 22:49

1 Answers1

3

If you are using base R, you want to look at aggregate:

data <- read.table(text="dog.breed    weight.lbs
Labrador     63
Maltese      6
Dalmatian    55
Poodle       51
Maltese      4
Dalmatian    48
Poodle       56", header=TRUE, )

aggregate(. ~ dog.breed, data=data, sum)

#  dog.breed weight.lbs
#1 Dalmatian        103
#2  Labrador         63
#3   Maltese         10
#4    Poodle        107

If you are looking for a way to plot directly from the data without having to do anything, ggplot is your friend:

require(ggplot2)
ggplot(data, aes(x=dog.breed, y=weight.lbs)) +
  geom_bar(stat="identity")

ggplot(data, aes(x=dog.breed)) +
  geom_bar(aes(weight=weight.lbs))

The first graph plots multiple y values for each x, where geom_bar defaults to a "stack" value for the position arg, thus giving the sums over x. The second graph works because geom_bar defaults to the stat_bin producing a histogram for, but with the specification of a weight. Both produce equivalent output:

plot

Community
  • 1
  • 1
mindless.panda
  • 4,014
  • 4
  • 35
  • 57