3

I am using R studio in Ubuntu, with standard updated R and ggplot2

I try to create a histogram in ggplot, and to separate the data by groups.

I need the plot's y axis to say the frequency of each bin in the subgroup that was split by the facet grid.

for example if i have two entries in the data

a group
1 1
2 2

I need to use facet_grid to split by group, and then to show that a has one bar for 1 that is 100% percent of the examples in group 1 and vice versa.

I found out that the way to do it, is using (..count..)/sum(..count) but sum(..count..) will count the frequency of that been in the entire data frame and will give me unwanted results,

I can't find good documentation for deep using of ..count..

question about special ggplot variables

another question about ..count..

There is nothing very comprehensive in the docs,

This is the example code i am using

df <- data.frame(a = 1:10, b = 1:10, group = c(rep(1,5),rep(2,5)))
p<-ggplot(df) + geom_histogram(aes(x = a, y = (..count..)/sum(..count..))) +  
   facet_grid(group ~ .)

You can see that the y axis will contain 0.1 as the highest value, i would like it to show that 100% percent of the 1 values are in group 1 for example. etc.

edit:

Thanks to Jimbou for the answer and reference to a well built walk around that is suitable for discrete data, pls note that the real problem i am having here will need to use continuous data, and bins that group more than one value, furthermore, there is no proper documentation about how to do this with the ..count.. function and therefor I believe this is important to find a solution and not to use walk around

Community
  • 1
  • 1
thebeancounter
  • 4,261
  • 8
  • 61
  • 109

4 Answers4

2

Here is a dplyr solution.

df%>% group_by(group)%>%mutate(n = n(), prop = n/sum(n))
shayaa
  • 2,787
  • 13
  • 19
  • this looks promising, could you please provide with more details? – thebeancounter Jul 25 '16 at 08:56
  • well, if you want to learn dplyr, there is an excellent vignette. Basically, the `%>%` is a piping operator which can be interpreted as "and then". First group the data frame by group, and then add a column which counts the occurrences within the group, then calculate another column which takes this within group counts and calculates the proportion by dividing by the total. – shayaa Jul 25 '16 at 09:19
  • please provide a more detailed answer so i will be able to accept it as an answer... but before, this is great for working with discrete vaiables, but again, how does this help us with ggplot and making a y axis for the relative frequency of each bin in the subgroup made by facet grid? – thebeancounter Jul 25 '16 at 09:27
  • Well, in this case, mutate can take multiple arguments, which are multiple columns which i will add to your original matrix. The first one calculates the number of times that a particular group has been present, appends it to df and calls the variable n. Then it takes this count, sums it across the whole group, and divides the two numbers. It calls this value prop, and appends it to the matrix. – shayaa Jul 25 '16 at 09:32
  • we need, in the end, to be able to say, for example, that number 1.0 - 2.0 were 30% of the group 1 etc. – thebeancounter Jul 25 '16 at 09:36
2

After a lot of playing around, and very good directions you all gave, i found that with a little addition and blend between Jimbou's and Shayaa's answers, and some added code this works beautifully.

t <- data %>% group_by(group,member,v_rate) %>% tally %>% mutate(f = n/sum(n))

will take the data and will group by group, member, v_rate, and will add count of each group divided by the sum (relative frequency in the group)

than we want to create the histogram with ggplot2 and use those values as the weight function of the histogram, otherwise it was all for vain,

 p <- ggplot(t, aes(x = v_rate, weight = f)) + geom_histogram() + facet_grid(group ~ member)

that works great.

thebeancounter
  • 4,261
  • 8
  • 61
  • 109
1

You can try:

First calculate length of each group using ave:

df$gr_l <- ave(df$a, df$group, FUN = function(x) length(x))

Get the proportion of each a within the groups using by:

df$gr_prop <- c(by(df, df$group + df$a, FUN = function(x) length(x$a)/unique(x$gr_l) ))

Plot the data.

ggplot(df, aes(x=a, y=gr_prop)) + 
      geom_bar(stat="identity",position='dodge') + 
      facet_grid(group ~ .)

The question is similar to this and that question using ddply or an internal ggplot solution.

Community
  • 1
  • 1
Roman
  • 17,008
  • 3
  • 36
  • 49
  • the question is not a duplicate, your answer is referring to discrete data, and it will not work for continuous data... and i know that i can walk around the problem and split the data and then sum and regroup, but the main idea was to understand the way that ..count.. works, because there is no proper documentation for such cases, therefor this is an important seperate question. – thebeancounter Jul 25 '16 at 08:45
  • I think you would just use ..density.. in that case. – shayaa Jul 25 '16 at 08:51
  • @shayaa again, same problem, it will check the density vs all the data, i need it to calculate it inside the group that was separated by facet grid – thebeancounter Jul 25 '16 at 08:53
  • @jimbou - also, when using continuous data, and bins, unique will distort the values, if choosing this solution, i feel that using unique is problematic. – thebeancounter Jul 25 '16 at 08:54
  • 1
    @captainshai as the group length `gr_l` is calculated only over the `df$group` vector, every interaction of `df$group + df$a` has the same value for each group element. Thus `unique` is no problem. I would do the binning independently from `ggplot` using `.bincode`. – Roman Jul 25 '16 at 09:04
0

try ..density.. ? this will give local mass vs local count over overall all-encompassing count as currently written