0

My data frame looks like this:

plant  distance
one      1
one      3
one      2
one      3
one      7
one      4
one      6
one      8
one      9
two      1
two      6
two      4
two      8
two      5
two      3
three ……

I want to split distance of each level into groups by interval(for instance,interval=3).

plant  distance group
  one      1    1
  one      3    1
  one      2    1
  one      3    1
  one      7    3
  one      4    2
  one      6    2
  one      8    3
  one      9    3
  two      1    1
  two      6    2
  two      4    2
  two      8    3
  two      5    2
  two      3    1
  three ……

And compute percentage of each group

plant group percentage
one     1     0.44
one     2     0.22
one     3     0.33
two     1     0.33
two     2     0.50
two     3     0.17
three ……

Finally, plot the percentages of each level of each group similar like this: enter image description here

and I do not know how to split each level by interval. Sorry for my English! Thank you for your help!

just_rookie
  • 873
  • 12
  • 33
  • What exactly do you want? What have you tried? For example, you could split your data frame by using `split(my_data$distance, my_data$plant)`, you could group it by `cut(my_data$distance, breaks = seq(min(my_data$distance), max(my_data$distance), 3))` etc. – lukeA Mar 24 '15 at 08:05
  • @lukeA Thank you for your reply, I have tried many times and I have updated the post. – just_rookie Mar 24 '15 at 08:31

1 Answers1

1

Here's one way to do it using dplyr:

library(dplyr)
library(ggplot2)
my_data %>%
  mutate(group = factor(cut(distance, seq(0, max(distance), 3), F))) %>%
  group_by(plant, group) %>%
  summarise(percentage = n()) %>%
  mutate(percentage = percentage / sum(percentage)) %>%
  ggplot(aes(x = plant, y = percentage, fill = group)) + 
  geom_bar(stat = "identity", position = "stack")

enter image description here

lukeA
  • 53,097
  • 5
  • 97
  • 100
  • Thank you for your solution, and it is efficient. I got an error: `ggplot2 doesn't know how to deal with data of class uneval`. Could you talk a bit more about the '%>%', 'group_by', and 'mutate'? – just_rookie Mar 24 '15 at 09:44
  • 1
    `mutate` is `my_data$group <- factor(cut(my_date$distance, seq(0, max(my_data$distance), 3), F)))`. `group_by` groups the data set for `summarise`. `%>%` passes my_data from operation to operation, if you want to put it like that. You'll find plenty of info by google'ing. I'm using dplyr_0.4.0 and ggplot2_1.0.0 - the code works here with your example data. – lukeA Mar 24 '15 at 10:03
  • We have used the symbol `%>%` to chain operations, and how to break the chain and save the plot? – just_rookie Mar 24 '15 at 12:31
  • If you want to save the resulting data frame and the plot separately: `dat <- my_data %>% mutate(group = factor(cut(distance, seq(0, max(distance), 3), F))) %>% group_by(plant, group) %>% summarise(percentage = n()) %>% mutate(percentage = percentage / sum(percentage)); p <- ggplot(dat, aes(x = plant, y = percentage, fill = group)) + geom_bar(stat = "identity", position = "stack"); p` – lukeA Mar 24 '15 at 13:06
  • I found a small problem that when I used your solution to group the data, the last group was `NA`. For instance, we divide the array of 1 to 10 by `interval=3` into four groups, i.e. `group 1(1 2 3)`, `group 2(4 5 6)`, `group 3(7 8 9)` and `group 4(10)`, but `group 4` was `NA`, because the length of `group 4` was less than the `interval=3`, so my question is how to fix it? Thank you very much! – just_rookie Mar 31 '15 at 09:48
  • Please post a new question using a minimal [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – lukeA Mar 31 '15 at 09:52
  • I have post a new question, see [here](http://stackoverflow.com/questions/29368010/split-data-and-create-stacked-percent-barplot-in-r-post2) . – just_rookie Mar 31 '15 at 11:53