1

I am trying to show different growing season lengths by displaying crop planting and harvest dates at multiple regions.

My final goal is a graph that looks like this:

enter image description here

which was taken from an answer to this question. Note that the dates are in julian days (day of year).

My first attempt to reproduce a similar plot is:

library(data.table)
library(ggplot2)

mydat <- "Region\tCrop\tPlanting.Begin\tPlanting.End\tHarvest.Begin\tHarvest.End\nCenter-West\tSoybean\t245\t275\t1\t92\nCenter-West\tCorn\t245\t336\t32\t153\nSouth\tSoybean\t245\t1\t1\t122\nSouth\tCorn\t183\t336\t1\t153\nSoutheast\tSoybean\t275\t336\t1\t122\nSoutheast\tCorn\t214\t336\t32\t122"

# read data as data table
mydat <- setDT(read.table(textConnection(mydat), sep = "\t", header=T))

# melt data table
m <- melt(mydat, id.vars=c("Region","Crop"), variable.name="Period", value.name="value")

# plot stacked bars
ggplot(m, aes(x=Crop, y=value, fill=Period, colour=Period)) + 
  geom_bar(stat="identity") +
  facet_wrap(~Region, nrow=3) +
  coord_flip() +
  theme_bw(base_size=18) +
  scale_colour_manual(values = c("Planting.Begin" = "black", "Planting.End" = "black",
                                 "Harvest.Begin" = "black", "Harvest.End" = "black"), guide = "none")

enter image description here

However, there's a few issues with this plot:

  1. Because the bars are stacked, the values on the x-axis are aggregated and end up too high - out of the 1-365 scale that represents day of year.

  2. I need to combine Planting.Begin and Planting.End in the same color, and do the same to Harvest.Begin and Harvest.End.

  3. Also, a "void" (or a completely uncolored bar) needs to be created between Planting.Begin and Harvest.End.

Perhaps the graph could be achieved with geom_rect or geom_segment, but I really want to stick to geom_bar since it's more customizable (for example, it accepts scale_colour_manual in order to add black borders to the bars).

Any hints on how to create such graph?

thiagoveloso
  • 2,537
  • 3
  • 28
  • 57
  • 1
    Hi OP - can you share the dataset, `m` please? If it is not too large (<50 rows), you can share directly in your question (preferred) via typing `dput(m)` into your console and pasting the output of that function (should start with `structure(...`) into your question, formatted as code. If `m` is too large, then I would recommend sending maybe one of the `Region`s in your dataset. – chemdork123 Aug 26 '20 at 22:03
  • @chemdork123 thanks for your comment, but... the data is all in the question. `m` is just a 'melted' version of `mydat`. Are you unable to run the code? – thiagoveloso Aug 26 '20 at 22:06
  • Oh I see... you have `"\"` delimited data. – chemdork123 Aug 26 '20 at 22:07

1 Answers1

1

I don't think this is something you can do with a geom_bar or geom_col. A more general approach would be to use geom_rect to draw rectangles. To do this, we need to reshape the data a bit

plotdata <- mydat %>% 
  dplyr::mutate(Crop = factor(Crop)) %>% 
  tidyr::pivot_longer(Planting.Begin:Harvest.End, names_to="period") %>% 
  tidyr::separate(period, c("Type","Event")) %>% 
  tidyr::pivot_wider(names_from=Event, values_from=value)


#    Region      Crop    Type     Begin   End
#    <chr>       <fct>   <chr>    <int> <int>
#  1 Center-West Soybean Planting   245   275
#  2 Center-West Soybean Harvest      1    92
#  3 Center-West Corn    Planting   245   336
#  4 Center-West Corn    Harvest     32   153
#  5 South       Soybean Planting   245     1
#  ...

We've used tidyr to reshape the data so we have one row per rectangle that we want to draw and we've also make Crop a factor. We can then plot it like this

ggplot(plotdata) + 
  aes(ymin=as.numeric(Crop)-.45, ymax=as.numeric(Crop)+.45, xmin=Begin, xmax=End, fill=Type) + 
  geom_rect(color="black") + 
  facet_wrap(~Region, nrow=3) + 
  theme_bw(base_size=18) +
  scale_y_continuous(breaks=seq_along(levels(plotdata$Crop)), labels=levels(plotdata$Crop))

enter image description here

The part that's a bit messy here that we are using a discrete scale for y but geom_rect prefers numeric values, so since the values are factors now, we use the numeric values for the factors to create ymin and ymax positions. Then we need to replace the y axis with the names of the levels of the factor.

If you also wanted to get the month names on the x axis you could do something like

dateticks <- seq.Date(as.Date("2020-01-01"), as.Date("2020-12-01"),by="month")
# then add this to you plot
  ... + 
    scale_x_continuous(breaks=lubridate::yday(dateticks),
                       labels=lubridate::month(dateticks, label=TRUE, abbr=TRUE))
MrFlick
  • 195,160
  • 17
  • 277
  • 295