-1

I'm trying to create a stacked area graph with R using the package ggplot2 with the below data:

> dput(ec.admin1.ma.tall[1:20,])
structure(list(date = structure(c(18346, 18347, 18348, 18349, 
18350, 18351, 18352, 18353, 18362, 18363, 18364, 18365, 18366, 
18367, 18354, 18374, 18375, 18376, 18379, 18380), class = "Date"), 
locations = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("azuay_newcase_avg", 
"bolivar_newcase_avg", "canar_newcase_avg", "carchi_newcase_avg", 
"chimborazo_newcase_avg", "cotopaxi_newcase_avg", "eloro_newcase_avg", 
"esmeraldas_newcase_avg", "galapagos_newcase_avg", "guayas_newcase_avg", 
"imbabura_newcase_avg", "loja_newcase_avg", "losrios_newcase_avg", 
"manabi_newcase_avg", "moronasant_newcase_avg", "napo_newcase_avg", 
"orellana_newcase_avg", "pastaza_newcase_avg", "pichincha_newcase_avg", 
"santaelena_newcase_avg", "stodom_newcase_avg", "sucumbios_newcase_avg", 
"tungurahua_newcase_avg", "zamchin_newcase_avg"), class = "factor"), 
newcases_ma = c(NA, NA, NA, 5.85714285714286, 8.14285714285714, 
13.1428571428571, 12.8571428571429, 16.2857142857143, 15.2857142857143, 
16.1428571428571, 14.2857142857143, 12.5714285714286, 18, 
19.2857142857143, 39.2857142857143, 38.7142857142857, 53.2857142857143, 
53, 52.4285714285714, 46)), row.names = c(NA, 20L), class = "data.frame")
> ec.admin1.ma.tall$locations <- factor(ec.admin1.ma.tall$locations)
> ec.admin1.ma.tall$date <- as.Date(ec.admin1.ma.tall$date, "%m/%d/%Y") 
> ggplot(ec.admin1.ma.tall, aes(x = date, y = newcases_ma, fill = locations, group = 
  locations)) + geom_area()

The image I get from this code is: Stacked Area Graph plotting number of new cases by region

However, from plotting the individual regions, I don't believe my plot is accurate. The code for this plot is below:

ggplot(ec.admin1.ma.tall, aes(x = date, y = newcases_ma, fill = locations)) + 
geom_col() +
labs(title = "Moving 7-Day Average for New Cases in Admin 1 Regions - Ecuador",
   x = "Date", y = "7-Day Moving Average, New Cases") +
theme(axis.text.x = element_text(angle = 90, size = rel(0.5), vjust = 0.5, hjust=1)) +
facet_wrap(~locations, nrow = 6, scales = "free") 

Bar graph of new cases over time, split by individual regions

As you can see from the y-axis of these individual regions, none of the values go above 2000 and not many go even above 1000 cases. Would anyone know why there is this discrepancy between the individual region's data and the stacked area graph?

lhn24
  • 1
  • 1

1 Answers1

1

I just took a quick look, but the plots seem reasonable to me and the code looks OK too. Check out the "guayas" small multiples plot. the peak values early on reach about 1500, which is about the vertical size of the large green section of your stacked area plot. None go over 2000, but the sum of guayas and other regions certainly goes over 2000 at that particular point on the x-axis.

enter image description here

enter image description here

Vincent
  • 15,809
  • 7
  • 37
  • 39
  • Hi! I'm actually not trying to take the sum of the plots though, I just want to show a comparison of the true values for each region through a stacked area plot. Is there a way to make this change instead of showing sums on this graph? – lhn24 Nov 01 '20 at 19:42
  • Well, the current graph *does* allow you tu make a comparison fo the true values of each region. In the `geom_area` plot, you see that the green area is much bigger than the blue area, Can you show us an example of the graph you want? – Vincent Nov 01 '20 at 19:45
  • What I'm saying is that the distance between my two left arrows on the left is exactly equal to the height of the bar on the single guayas plot. There is no "sum" being done here. The histograms are just stacked on top of each other, which is exactly what you asked for. – Vincent Nov 01 '20 at 19:48
  • Ah I see, would there be a way to order the histograms in a way that orders the values from least to greatest, where the lower values are positioned towards the bottom of the stack and the greater values are towards the top? – lhn24 Nov 01 '20 at 23:52
  • Yes, you need to convert your `locations` variable to a factor and make sure that the factor levels are in the order you want to display them. You'll find a ton of tutorials here or elsewhere online with keywords such as "ggplot2 order factor" – Vincent Nov 02 '20 at 01:04