0

Simply I want to Plot a bar chart like the following figure from Orange dataset

Any help will be appreciated.

enter image description here

fagoz
  • 49
  • 4
  • see `?cut` for the first part ; https://stackoverflow.com/questions/13559076/convert-continuous-numeric-values-to-discrete-categories-defined-by-intervals – user20650 Mar 11 '18 at 22:01
  • Next time when you ask a question, it would be a good practice to show what you have tried so far. – www Mar 11 '18 at 22:27
  • I also noticed that you have not accepted any answer to your question although you have asked three answers so far and all of them received some nice answers. Please see this link to learn how tow accept an answer (https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work) and consider to accept the answer that you think is useful and helpful to your question. – www Mar 11 '18 at 22:30
  • Thanks @www It doesn't allow me to make it as answer. – fagoz Mar 11 '18 at 22:33
  • You probably confused with the idea of "upvote" and "accept the answer". You cannot upvote any posts because your reputation is still low, but you can accept the answer you think is the most relevant to your question. Please review the link I shared with you. – www Mar 11 '18 at 22:37

4 Answers4

3

The idea of my code is to use case_when to create the ageGroup column first, summarize the data to show only maximum for each Tree and ageGroup combination, and then convert the ageGroup column to factor and arrange the column, which is relevant to the order on the axis of bar-chart.

We can then plot the data using ggplot2. Notice that geom_col is a simpler version to create bar-chart compared to geom_bar without the needs to call stat = "identity". scale_fill_brewer can call the colorbrewer palette directly, which is quite handy.

data("Orange")

library(dplyr)
library(ggplot2)

Orange2 <- Orange %>%
  mutate(ageGroup = case_when(
    age <= 250                 ~"Young",
    age > 250 & age <= 900     ~"Adult",
    age > 900                  ~"Old"
  )) %>%
  group_by(Tree, ageGroup) %>%
  summarise(circumference = max(circumference)) %>%
  ungroup() %>%
  mutate(ageGroup = factor(ageGroup, levels = c("Young", "Adult", "Old"))) %>%
  arrange()


ggplot(Orange2, aes(x = ageGroup, y = circumference, fill = Tree)) +
  geom_col(position = position_dodge()) +
  scale_x_discrete(name = "Age Group") +
  scale_y_continuous(name = "Circumference") +
  coord_flip() +
  scale_fill_brewer(type = "qual", palette = "Paired") +
  theme_bw() +
  ggtitle("Growth of Orange Trees")

enter image description here

www
  • 38,575
  • 12
  • 48
  • 84
1

as you wished, same color, labes, axes

library(tidyverse)
color_palette <- c("#a5cde2", "#1e78b5",  "#b0dd89", "#33a02b", "#f99a98")
Orange %>% 
  mutate(AgeGroup=ifelse(age<250, "young", ifelse(age>900, "old", "adult"))) %>% 
  group_by(Tree, AgeGroup) %>%
  summarise(circumference = max(circumference)) %>%
  ggplot(aes(AgeGroup, circumference, fill=Tree)) +
  geom_bar(position = "dodge", stat="identity") +
  scale_x_discrete(limits=c("young","adult", "old")) +
  coord_flip() +
  scale_fill_manual(values = color_palette) +
  theme_bw()

enter image description here

Stephan
  • 2,056
  • 1
  • 9
  • 20
  • 1
    I have a question. When providing multiple values in a data frame to `ggplot`, does `ggplot` simply plot the maximum of each group? I am asking this because in your answer it seems like you did not summarize the `orange` data frame for the maximum of each group, but your plot simply shows the maximum. It would be good to learn if this behavior is true. Thanks. – www Mar 11 '18 at 22:14
  • You mean the maximum value of each group when there are multiple values, right? In my opinion, it is still a good practice to summarize the data to have only one number per group when creating a bar-chart. But good to learn this is the default behavior of `ggplot2`. Thanks. – www Mar 11 '18 at 22:19
  • you are right, when more than 2 values appread per AgeGroup for the same tree, the max (latest measurement) should be taken. fixed that. ty – Stephan Mar 11 '18 at 22:33
  • Thanks for the update, and could you clarify does `ggplot` takes the "maximum" or the "latest observation" as the value when multiple values are provided? – www Mar 11 '18 at 22:38
  • when you run the code until the summarise function (without the %>%) you see the data that is given to ggplot. we summarise with `max`, which is assumably the latest observation (larger circumference). – Stephan Mar 11 '18 at 22:41
1

For variation, a dplyrless answer.

Use cut to discretise the age variable

Orange$ageGrp <- with(Orange, cut(age, c(0, 250, 900, Inf), 
                                  c("Young", "Adult", "old")))

position_dodge() is used so the bars are next to each other, and setting fun.y=max selects the maximum circumference.

library(ggplot2)    
ggplot(Orange, aes(x=ageGrp, y=circumference, fill=Tree)) +
               stat_summary(geom="bar", fun.y=max, position=position_dodge()) +
               coord_flip()

Or using geom_bar directly

ggplot(Orange, aes(x=ageGrp, y=circumference, fill=Tree)) +
               geom_bar(stat="summary", fun.y=max, position=position_dodge()) + 
               coord_flip()
user20650
  • 24,654
  • 5
  • 56
  • 91
0

You could assign the groups based on age by using mutate and if_else.

library("tidyverse")
data(Orange)

Orange%>%
  mutate(age_group=if_else(age>900,"Old",
                           if_else(age<900&age>250,"Adult",
                                   if_else(age<250,"Young",""))))%>%
  ggplot(aes(age_group,circumference,fill=Tree))+
  geom_bar(stat="identity",position=position_dodge())+
  scale_x_discrete(limits=c("Young","Adult","Old")))+
  coord_flip()

age group vs circumference of orange trees

nadizan
  • 1,323
  • 10
  • 23