0

I am having a problem aligning my graphics using facet_wrap(). I have multiple years of data but I am subsetting to display only two years. For some unknown reason the upper plot is shifted to the left and the lower plot is shifted to the right (See attached snapshot). My dataset is very large to post here so it can be downloaded from the link below if someone is willing to help me to align the plots. https://login.filesanywhere.com/fs/v.aspx?v=8c6b678a5c61707ab0ae

 Here is the snapshot:

enter image description here

This is what I have tried:

library(writexl)
library(readxl)
library(tidyverse)
library(lubridate)
mydat <- read_excel("all.xlsx", sheet="Sheet1")  

#subset 2 years
start <- 1998
end <- 1999
a <- dplyr::filter(mydat, year %in% start:end)

  ggplot(a,aes(date,Salinity,color=Box)) + 
                  geom_line(size=.8) +
                  theme(legend.position = 'none')  +
       scale_x_date(date_breaks = "1 month",date_labels = "%b",expand=c(0,0.5)) +
                              labs(x="",y= "test") +
                                 facet_wrap(~year,ncol=1) 

I will be subsetting up to 10 years in the future. I am also wondering what's the best to subset multiple years using dplyr or base. Thanks beforehand.

I get the following error after trying your suggestion:

subset 2 years
start <- 1998
end <- 2000

a <- mydat %>%
    dplyr::filter(year %in% start:end) %>% # if you need to subset the years
    mutate(date = as.Date(gsub("\\d{4}", "0000", date)))
     
     
    ggplot(a,aes(date, Salinity,color=Box)) + 
    geom_line(size=.8) + 
    scale_x_date(date_breaks = "1 month",date_labels = "%b",expand=c(0,0.5)) + 
    labs(x="",y= "test") +
    facet_wrap(~year,ncol=1,scales="free_x")

   Error in charToDate(x) : 
  character string is not in a standard unambiguous format

Maybe 'date' doesn't like the zeros? Question: Does it work for you using the dataset from the link?

Salvador
  • 1,229
  • 1
  • 11
  • 19
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jul 08 '21 at 08:14
  • That's probably because X-axis is showing monthly values of two different years. You can try changing the scale of X-axis in the facets: `facet_wrap(~ year, ncol = 1, scales = "free_x")` – Zaw Jul 08 '21 at 08:27
  • Please do not provide unnecessarily large files or data structures, especially not one that requires us to download from a random place. For all we know, it may contain a virus. Please keep it small and simple enough to use. That said, @Zaw's suggestion works, but if one year contains data from fewer months than another year, the scale would be off. An alternative is to change the dates so that they all have the same year within the `date`, while `year` (which you don't change) is retained the `facet_wrap`. – LC-datascientist Jul 08 '21 at 08:46
  • Understood. I couldn't make a smaller subset to show my point. I tried with about 300 records but nothing was being displayed, hence, I provided a link to the data. On a different note, Thanks to @Zaw for the suggestion. I am trying facet_grid() instead of facet_wrap but the scales ignore the `scales= "free" or scales="free_y"` arguments. Any idea why? – Salvador Jul 08 '21 at 08:52
  • Just change the year "0000" to "0001" and it will work. – LC-datascientist Jul 11 '21 at 00:18

1 Answers1

0

The plot from the OP appears shifted because the date column in the data contains separate years. Thus, the ggplot function is displaying those dates chronologically along the x-axis, spanning years.

A solution is to change the year in all the dates in the date column to be the same (e.g., "1111"), while the actual year information is retained in the year column. The years in the year column is used to stratify the plot.

# example data
mydat <- data.frame(
    date = as.Date(
        c("1998-10-1", "1998-11-1", "1999-10-1", "1999-11-1", "1999-12-1")), 
    value = 1:5, 
    year = c(1998, 1998, 1999, 1999, 1999))

mydat
#        date value year
#1 1998-10-01     1 1998
#2 1998-11-01     2 1998
#3 1999-10-01     3 1999
#4 1999-11-01     4 1999
#5 1999-12-01     5 1999

# solution by changing the year in all the dates the same
library(ggplot2)

mydat$date <- as.Date(gsub("\\d{4}", "1111", mydat$date))

ggplot(mydat, aes(date, value)) + 
    geom_line(size=.8) + 
    scale_x_date(date_breaks = "1 month",date_labels = "%b",expand=c(0,0.5)) + 
    labs(x="",y= "test") +
    facet_wrap(~year, ncol = 1)

enter image description here

In gsub("\\d{4}", "1111", mydat$date), a regular expression is used for the pattern. It finds four consecutive digits and substitutes them with "1111".

EDIT

If you don't want to make a hard change to the actual date data, you can use pipe operators (%>%) and other functions from the dplyr package:

library(ggplot2)
library(dplyr) # or library(tidyverse) works, too

# subset 2 years
start <- 1998
end <- 1999

mydat %>%
    filter(year %in% start:end) %>% # if you need to subset the years
    mutate(date = as.Date(gsub("\\d{4}", "1111", date))) %>% 
    ggplot(aes(date, value)) + 
    geom_line(size = 0.8) + 
    scale_x_date(date_breaks = "1 month", date_labels = "%b", expand = c(0,0.5)) + 
    labs(x = "", y = "test") +
    facet_wrap(~year, ncol = 1)

UPDATE NOTE

I noticed that the year 0000 (or with just 0) or any year that is 9999+ do not work in the OP's dataset. The year may have to be an integer between 1 and 9998. As long as the arbitrary year is the same throughout the date column, the plot will work as intended.

enter image description here

LC-datascientist
  • 1,960
  • 1
  • 18
  • 32
  • I am trying it but getting an error message.`Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments In addition: Warning message: In data.matrix(data) : NAs introduced by coercion` I will continue to troubleshoot... – Salvador Jul 08 '21 at 14:24
  • @Salvador, do you know which part of the code is getting you this match error? I think it could be `filter(year %in% start:end)` because it uses `%in%` for matching. Try `dplyr::filter(year %in% start:end)` because you may have another `filter` function from another package or try subsetting it first and save the subset as another data frame, like you did in your code. Let me know if it works. – LC-datascientist Jul 08 '21 at 15:56
  • @LC-datascientist-- I edited my post above. Still get an error but it is different now. – Salvador Jul 08 '21 at 18:50
  • @Salvador, You're right. Seems like the date doesn't like "0000" (and "9999") for the year in your dataset, but it works in my example here. I don't know why. Anyway, for the purpose of the figure, it doesn't matter what year you choose as long as they are all the same. – LC-datascientist Jul 11 '21 at 00:17
  • @LC-datascientist-- I wonder what data you used to make that plot because in your example you are plotting date and value. Changing from '0000' to '1111' on my dataset it stills give me this error `Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments In addition: Warning message: In data.matrix(data) : NAs introduced by coercion` Can you try the dataset from the link to see if you can re-create my error message? – Salvador Jul 12 '21 at 01:07
  • @LC-datascientist-- After playing with it for a bit it is working now. Really appreciate your help with this. I will use this trick in the future for other datasets. – Salvador Jul 12 '21 at 01:45
  • @LC-datascientist-- After taking a closer look to the graphic you created, I noticed that you used facet_wrap and the bottom graph has the months along the x axis but the upper graph doesn't have any month labels. I can only get this behaviour with facet_grid. How do you hide the month labels for the upper graph using facet_wrap? Also, how do change the month labels from Oct to Sept? – Salvador Jul 12 '21 at 02:15
  • @Salvador, I later tried with your large dataset and used your code (with modification on the year in `date` as described in the solution). `facet_wrap` hides axis labels that are between graph panels by default. The snapshot in your question shows that, too. If your new plots are showing the extra axis labels unintentionally, try restarting your R session, or see which R package that you're using may be affecting it (if you've loaded other R packages not shown in the question). If it still shows, you can post another question about axis labels showing in all of your `facet_wrap()` panels. – LC-datascientist Jul 12 '21 at 03:32
  • In my short example figure (first plot), the months on the x-axis are between Oct and Dec because I only provided the range of dates (after fixing the year) within those months in the example. If you want to change the dates to start from Oct and end at Sept (12 months later), you can add one year in `date` for all months from Jan to Sept—thus, chronologically, ggplot with be plotting, e.g., Oct 1111 to Sept 1112. If you want to restrict the x-axis scale to only a few months, you can set `scale_x_date(limits = c(min, max))`, where `min` and `max` are the min/max dates. – LC-datascientist Jul 12 '21 at 03:55
  • I am not sure why facet_wrap is showing all the months between panels on my plots. I have multiple years and all my panels are showing the months along the x axis which it takes space and is annoying to see repeated X axis all over. If I don't use your suggestion '1111' for date, my x axis starts from Oct to Sept but if I use it then my x axis labels start in Jan to Dec. I wonder why '111' changes the order of the months. – Salvador Jul 12 '21 at 04:12
  • @LC-datascientist-- I would post a question with my full code and dataset but it is really large and people here don't like that. If I post a small dataset I can't recreate what I am trying to accomplish. – Salvador Jul 12 '21 at 04:14
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/234753/discussion-between-lc-datascientist-and-salvador). – LC-datascientist Jul 12 '21 at 04:38
  • Sure, I apologize. – Salvador Jul 12 '21 at 04:42