9

I have a dataset that has two categorical variables, viz., Year and Category and two continuous variables TotalSales and AverageCount.

    Year    Category      TotalSales    AverageCount
1   2013    Beverages      102074.29    22190.06
2   2013    Condiments      55277.56    14173.73
3   2013    Confections     36415.75    12138.58
4   2013    Dairy Products  30337.39    24400.00
5   2013    Seafood         53019.98    27905.25
6   2014    Beverages       81338.06    35400.00
7   2014    Condiments      55948.82    19981.72
8   2014    Confections     44478.36    24710.00
9   2014    Dairy Products  84412.36    32466.00
10  2014    Seafood         65544.19    14565.37

In MS Excel, we can happily get a pivot-plot for the same table, with Year and Category as AXIS, TotalSales and AverageCount as sigma values.

Using R, how do I draw such a graph as shown in the image, where the categorical variables are shown as multiple layers in the same graph?

enter image description here

P.S. One option that I could see is, by splitting the data frame into two separate dataframes (One for year 2013 and another for year 2014 in our case) and draw two graphs on one single plot, arranged in multiple rows to get the same effect. But is there any way to draw it as shown above?


Sample data used above

dat <- structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2014L, 
2014L, 2014L, 2014L, 2014L), Category = structure(c(1L, 2L, 3L, 
4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("Beverages", "Condiments", 
"Confections", "Dairy Products", "Seafood"), class = "factor"), 
    TotalSales = c(102074.29, 55277.56, 36415.75, 30337.39, 53019.98, 
    81338.06, 55948.82, 44478.36, 84412.36, 65544.19), AverageCount = c(22190.06, 
    14173.73, 12138.58, 24400, 27905.25, 35400, 19981.72, 24710, 
    32466, 14565.37)), .Names = c("Year", "Category", "TotalSales", 
"AverageCount"), class = "data.frame", row.names = c(NA, -10L
)
sunitprasad1
  • 768
  • 2
  • 12
  • 28
  • 1
    Perhaps you can `reshape2::melt` the two variables to long format first? (Not tested..) – talat May 04 '15 at 06:33

1 Answers1

23

You need to first reformat your data, as @EDi showed you how to in one of your older questions (ggplot : Multi variable (multiple continuous variable) plotting) and @docendo discimus suggested in the comments.

library(reshape2)
dat_l <- melt(dat, id.vars = c("Year", "Category"))

Then you can use faceting like so:

library(ggplot2)
p <- ggplot(data = dat_l, aes(x = Category, y = value, group = variable, fill = variable))
p <- p + geom_bar(stat = "identity", width = 0.5, position = "dodge")
p <- p + facet_grid(. ~ Year)
p <- p + theme_bw()
p <- p + theme(axis.text.x = element_text(angle = 90))
p

enter image description here

If you are particularly interested in making the figure more consistent with an Excel-look, there are some strategies in the answer here that might be helpful: How do I plot charts with nested categories axes?.

Your original data in an easier to paste format:

dat <- structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2014L, 
2014L, 2014L, 2014L, 2014L), Category = structure(c(1L, 2L, 3L, 
4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("Beverages", "Condiments", 
"Confections", "Dairy Products", "Seafood"), class = "factor"), 
    TotalSales = c(102074.29, 55277.56, 36415.75, 30337.39, 53019.98, 
    81338.06, 55948.82, 44478.36, 84412.36, 65544.19), AverageCount = c(22190.06, 
    14173.73, 12138.58, 24400, 27905.25, 35400, 19981.72, 24710, 
    32466, 14565.37)), .Names = c("Year", "Category", "TotalSales", 
"AverageCount"), class = "data.frame", row.names = c(NA, -10L
))
Community
  • 1
  • 1
LJW
  • 795
  • 2
  • 13
  • 27
  • 1
    Spot on! It helps.. And second, what kind of improvements I can make, so that I have a presentable output, if I have one more categorical variable? (Although I feel I should add that as a new question) – sunitprasad1 May 04 '15 at 11:44
  • Glad it helped @sunitprasad1. When you say "one more categorical variable" which variable are you thinking of? Year, category of product, or type of value? For any of those if you add the extra levels to your data frame they should be added to the plot with the same code. I could be misunderstanding the question though. – LJW May 05 '15 at 02:24
  • I tried it meanwhile. The table I'm using has `Year, Quarter, Product and Category` and `Sales, Count`. I wanted to know how should I format, re-structure or arrange the data in a user readable format if I added `Quarter` to this code and see how it goes. Well, the graph that came after adding the third variable, though informative, is highly confusing. I think I should try more on the link that you provided in your answer. Will post a new question, in case I'm stuck. – sunitprasad1 May 05 '15 at 05:57
  • @sunitprasad1, Sounds like you might want to facet by 2 variables if you are using year and quarter. Check out the Cookbook for R for some good examples of faceting with 2 variables http://www.cookbook-r.com/Graphs/Facets_(ggplot2)/. – LJW May 05 '15 at 23:19