1

I'm struggling with formatting the x-axis for my plot since I'm working with dates and I hope someone can help me out!

Here is my data:

Sample_date <- c("2019-02-19", "2019-02-19", "2019-02-19", "2019-02-27",  "2019-02-27", "2019-02-27", "2019-02-28", "2019-02-28", "2019-02-28", "2019-03-07", "2019-03-07", "2019-03-07", "2019-03-21", "2019-03-21", "2019-03-21", "2019-04-23", "2019-04-23", "2019-04-23", "2019-04-24", "2019-04-24", "2019-04-24")
Taxon <- c("D", "K", "A", "D", "K", "A", "D", "K", "A", "D", "K", "A", "D", "K", "A", "D", "K", "A", "D", "K", "A")
Value <- as.numeric(c("560", "9752", "8712", "1080", "16400", "18513", "640", "4267", "6534", "2320", "14660", "14697", "1520", "8094", "32670", "5040", "859796", "9801", "6560", "803232", "0"))
df <- data.frame(Sample_date, Taxon, Value)

df$Sample_date <- format(as.Date(df$Sample_date, format = "%Y-%m-%d"), "%d/%m")
title <- "Distribution 2019"

This is just a simplified extraction of my dataset, which spans over the whole year 2019. I would like the x-axis to show the specific dates and not a general scale with months or weeks. As some of the dates are really close, I cannot solve the issue with overlapping x-axis labels. Turning the dates 90° does not solve the issue as they would still overlap (the original dataset is big). So I would like to alternate the height of the dates on the x-axis by pasting a newline in front of every other date.

I thought I found the help I needed in these two posts (1 and 2), but this threw up new challenges when using dates as the formatting of my dates got lost.

I used the following code to make my plot:

ggplot(df, aes(x = as.Date(Sample_date, format = "%d/%m"), y = Value, fill = Taxon)) + 
  geom_area(stat = "identity", position = "stack") +
  scale_x_date(
    labels = function(dates) {
      fixedLabels <- c()
      for (l in 1:length(dates)) {
        fixedLabels[l] <- paste0(ifelse(l %% 2 == 0, '', '\n'), dates[l])
      }
      return(fixedLabels)
    },
    breaks = as.Date(df$Sample_date, format = "%d/%m")
  ) +
  theme(
    axis.text = element_text(size = 18), 
    plot.title = element_text(hjust = 0, size = 18), 
    legend.text = element_text(size = 18), 
    legend.key.size = unit(1.5, "cm"), 
    legend.position = "bottom", 
    legend.title = element_blank(), 
    axis.title.y = element_text(size = 18), 
    axis.ticks.x = element_blank(), 
    panel.grid.minor.x = element_blank(), 
    plot.margin = unit(c(0.2, 2, -0.2, 0), "cm")
  ) +
  ggtitle(title) +
  scale_y_continuous(labels = function(x) format(x, big.mark = " ", scientific = FALSE)) + 
  labs(x = "", y = "cells/liter")

And this is what I got: plot

The dates appear twice since I included breaks in the scale_x_date. So, my question is: How do I remove one set of dates and display my dates on the x-axis as 21/03 instead of 2020-03-21?

I have searched the web for help, but cannot make it out by myself. I would very much appreciate if you could give me any hints!

wylierose
  • 79
  • 5

2 Answers2

1

You can improve your plot by passing only the unique dates for the x-axis breaks and use the guide argument to offset the labels (you can use the n.dodge value within guide_axis() to set the number of offset rows). I don't think you'll be able to avoid tinkering with the text size and date format if you have a lot of breaks but try to use an abridged format if possible.

df <- data.frame(Sample_date, Taxon, Value)

df$Sample_date <- as.Date(df$Sample_date)

library(ggplot2)

ggplot(df, aes(x = Sample_date, y = Value, fill = Taxon)) + 
  geom_area(stat = "identity", position = "stack") +
  scale_x_date(breaks = unique(df$Sample_date),
               guide = guide_axis(n.dodge = 2),
               labels = function(x) format(x, "%d %b %y")) +
  theme(
    axis.text = element_text(size = 18), 
    plot.title = element_text(hjust = 0, size = 18), 
    legend.text = element_text(size = 18), 
    legend.key.size = unit(1.5, "cm"), 
    legend.position = "bottom", 
    legend.title = element_blank(), 
    axis.title.y = element_text(size = 18), 
    axis.ticks.x = element_blank(), 
    panel.grid.minor.x = element_blank(), 
    plot.margin = unit(c(0.2, 2, -0.2, 0), "cm")
  ) +
  ggtitle("Distribution 2019") +
  scale_y_continuous(labels = function(x) format(x, big.mark = " ", scientific = FALSE)) + 
  labs(x = "", y = "cells/liter")

enter image description here

Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
0

Try this. Basic idea is to get rid of the date-axis and to recode date as a factor:

library(ggplot2)

Sample_date <- c("2019-02-19", "2019-02-19", "2019-02-19", "2019-02-27",  "2019-02-27", "2019-02-27", "2019-02-28", "2019-02-28", "2019-02-28", "2019-03-07", "2019-03-07", "2019-03-07", "2019-03-21", "2019-03-21", "2019-03-21", "2019-04-23", "2019-04-23", "2019-04-23", "2019-04-24", "2019-04-24", "2019-04-24")
Taxon <- c("D", "K", "A", "D", "K", "A", "D", "K", "A", "D", "K", "A", "D", "K", "A", "D", "K", "A", "D", "K", "A")
Value <- as.numeric(c("560", "9752", "8712", "1080", "16400", "18513", "640", "4267", "6534", "2320", "14660", "14697", "1520", "8094", "32670", "5040", "859796", "9801", "6560", "803232", "0"))
df <- data.frame(Sample_date, Taxon, Value)

df$Sample_date <- as.Date(df$Sample_date, format = "%Y-%m-%d")
title <- "Distribution 2019"

# Add date labesl as a factor
df$date2 <- format(df$Sample_date, "%d/%m")
df$date2 = forcats::fct_reorder(df$date2, df$Sample_date)

# Also add group = taxon
ggplot(df, aes(x = date2, y = Value, fill = Taxon, group = Taxon)) + 
  geom_area(stat = "identity", position = "stack") +
  theme(
    axis.text = element_text(size = 18), 
    plot.title = element_text(hjust = 0, size = 18), 
    legend.text = element_text(size = 18), 
    legend.key.size = unit(1.5, "cm"), 
    legend.position = "bottom", 
    legend.title = element_blank(), 
    axis.title.y = element_text(size = 18), 
    axis.ticks.x = element_blank(), 
    panel.grid.minor.x = element_blank(), 
    plot.margin = unit(c(0.2, 2, -0.2, 0), "cm")
  ) +
  ggtitle(title) +
  scale_y_continuous(labels = function(x) format(x, big.mark = " ", scientific = FALSE)) + 
  labs(x = "", y = "cells/liter")

Created on 2020-03-26 by the reprex package (v0.3.0)

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Thanks for your reply, but of course the dates should be in the correct distance to each other, otherwise it's not usable for me. The above solution by H 1 was what I was looking for. – wylierose Mar 26 '20 at 12:22