0

I want to create a histogram for my data table dt grouped by acquiYear, where the y-axis represents the nrOrders and the x-axis the month. My data table looks like this:

structure(list(acquiYear = c("2014", "2014", "2014", "2014", "2014", "2014", 
"2014", "2014", "2014", "2014", "2014", "2014", "2015", "2015", 
"2015", "2015", "2015", "2015", "2015", "2015", "2015", "2015", 
"2015", "2015", "2016", "2016", "2016", "2016", "2016", "2016", 
"2016", "2016", "2016", "2016", "2016", "2016", "2017", "2017", 
"2017", "2017", "2017", "2017", "2017", "2017", "2017", "2017", 
"2017", "2017", "2018", "2018", "2018", "2018", "2018", "2018", 
"2018", "2018", "2018", "2018", "2018", "2018"), month = structure(c(1L, 2L, 3L, 4L, 
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), .Label = c("Jan", 
"Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", 
"Nov", "Dec"), class = "factor"), nrOrders = c(0, 0, 0, 0, 0, 
0, 0, 0, 1, 1, 2, 0, 2, 4, 5, 3, 7, 3, 5, 4, 3, 7, 8, 7, 2, 24, 
16, 33, 9, 27, 16, 10, 27, 9, 31, 35, 11, 11, 25, 15, 18, 19, 
19, 8, 27, 34, 43, 51, 0, 11, 2, 0, 0, 0, 0, 0, 4, 5, 1, 0), 
    ), row.names = c(NA, -60L), class = c("data.table", 
"data.frame"))

I need for each month per acquiYear a bar and for each acquiYear over the months a desity line. The colors for year should be c("#00943C", "#4A52A0", "#FDC300", "#6F6F6F", "#EC4C24"). How can I fix this?

MikiK
  • 398
  • 6
  • 19
  • What have you tried so far? – Allan Cameron Aug 28 '23 at 12:05
  • ``ggplot2::ggplot(data = dt, aes(x = month, y = nrOrders, color = acquiYear)) + geom_histogram(aes(color = acquiYear)) + geom_point(aes(color = acquiYear)) + xlab("Month") + ylab("Nr. of Orders") + ggtitle(paste("Delivery year 2018")) + theme_classic() + theme(plot.title = element_text(face = "bold", hjust = 0.5)) + theme(axis.title = element_text(face = "bold")) + scale_color_manual(values = c("#00943C", "#4A52A0", "#FDC300", "#6F6F6F", "#EC4C24"))`` This here gives me an error. – MikiK Aug 28 '23 at 12:09
  • The error message is: Error in `geom_histogram()`: ! Problem while computing stat. i Error occurred in the 1st layer. Caused by error in `setup_params()`: ! `stat_bin()` must only have an x or y aesthetic. – MikiK Aug 28 '23 at 12:11
  • I thought this would be similar to constructing a line chart with multiple lines per year, but it isn't. – MikiK Aug 28 '23 at 12:12
  • Please try sample data that you give us ... this is a parsing error, `argument 4 is empty`. – r2evans Aug 28 '23 at 12:19
  • I don't understand how you expect the *histogram* to look: typically the x-axis is the continuous variable (e.g., `nrOrders`), and the y-axis is the frequency or count of the variables. Are you hoping for a grouped barplot? – r2evans Aug 28 '23 at 12:22
  • Sorry which argument is empty? The y-axis in my case is a number of orders, which is a count. I have the following columns ``acquiYear``, ``month`` and ``nrOrders``. Yaeh, I probably got it mixed up. A grouped bar chart is the better word for it. – MikiK Aug 28 '23 at 12:25

2 Answers2

1

The problem is that what you are describing is not a histogram. A histogram is a way to show the distribution of a single continuous variable. Typically, the range of this variable is shown along the x axis, and the axis is split into fixed-width bins. A bar is constructed for each bin where the height of the bar on the y axis shows the count or proportion of observations that lie within that bin.

What you have is observations of three variables: the month, the year and the number of orders. You wish to show the number of orders on the y axis as a function of month, and also display the year as a grouping variable. It therefore appears that you are looking for a dodged bar chart. Perhaps something like this:

ggplot(df, aes(month, nrOrders, fill = acquiYear)) +
  geom_col(position = 'dodge') +  
  xlab("Month") +   
  ylab("Nr. of Orders") +    
  ggtitle(paste("Delivery year 2018")) +   
  theme_classic() +   
  theme(plot.title = element_text(face = "bold", hjust = 0.5)) +   
  theme(axis.title = element_text(face = "bold")) +
  scale_fill_manual(values = c("#00943C", "#4A52A0", "#FDC300", 
                               "#6F6F6F", "#EC4C24")) 

enter image description here

Similarly, adding a density curve for each year doesn't make any sense here. A density curve shows the density of measurements of a single variable over a continuous range (a bit like a smoothed histogram), whereas you have equally-spaced measurements that are already fully described by the bars.

You could add a smooth curve for each of the years, but the plot is already complex and the curves would not add any information; in fact, they would obscure the data that your plot already shows:

ggplot(df, aes(as.numeric(month), nrOrders, fill = acquiYear)) +
  geom_col(position = 'dodge') +  
  ggalt::stat_xspline(geom = 'area', spline_shape = -0.4, alpha = 0.3) +
  xlab("Month") +   
  ylab("Nr. of Orders") +    
  ggtitle(paste("Delivery year 2018")) +   
  theme_classic() +   
  theme(plot.title = element_text(face = "bold", hjust = 0.5)) +   
  theme(axis.title = element_text(face = "bold")) +
  scale_fill_manual(values = c("#00943C", "#4A52A0", "#FDC300", 
                               "#6F6F6F", "#EC4C24")) +
  scale_x_continuous(breaks = 1:12, labels = month.abb)

enter image description here

If you really want to do this, you may find that faceting gives a clearer picture:

ggplot(df, aes(as.numeric(month), nrOrders, fill = acquiYear)) +
  geom_col(position = 'dodge', width = 0.5) +  
  ggalt::stat_xspline(geom = 'area', spline_shape = -0.4, alpha = 0.5) +
  xlab("Month") +   
  ylab("Nr. of Orders") +    
  ggtitle(paste("Delivery year 2018")) +
  facet_wrap(.~acquiYear, ncol = 1) +
  theme_classic() +   
  theme(plot.title = element_text(face = "bold", hjust = 0.5)) +   
  theme(axis.title = element_text(face = "bold")) +
  scale_fill_manual(values = c("#00943C", "#4A52A0", "#FDC300", 
                               "#6F6F6F", "#EC4C24")) +
  scale_x_continuous(breaks = 1:12, labels = month.abb)

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
1

Would the following help:

ggplot2::ggplot(data=dt, aes(x=factor(month, levels = month.abb), y=as.numeric(nrOrders), fill= acquiYear)) +  ## reorder months and have nrOrders as numeric 
  geom_bar(stat='identity', position='dodge', linewidth = 1) +  ## Create bars 
  geom_point() +  
  xlab("Month") +   ylab("Nr. of Orders") +    ggtitle(paste("Delivery year 2018")) +   
  theme_classic() +   theme(plot.title = element_text(face = "bold", hjust = 0.5)) +   
  theme(axis.title = element_text(face = "bold")) +   
  scale_color_manual(values = c("#00943C", "#4A52A0", "#FDC300", "#6F6F6F", "#EC4C24"))
Matt B
  • 306
  • 6
  • Remember that `geom_bar(stat = 'identity')` is just a long way of writing `geom_col`. The `geom_col` option has been around for many years now but there are obviously still a lot of outdated teaching sources that use the old `geom_bar(stat = 'identity')` syntax. – Allan Cameron Aug 28 '23 at 13:12
  • Thank you for your comment. This is very informative – Matt B Aug 28 '23 at 13:22