6

I am fairly new to R, and have extensively searched StackOverflow for a solution to this problem but am coming up short. I am currently plotting estimated birth dates for sun bears, and I want my plot to only have the months April 2014 - March 2015 on the x-axis. Most of my data fits in this fine, but I have a value that has an error bar that starts in March and ends in May (Figure 1). When I plot this, the error bar either disappears, or stretches across the entire plot (Figure 2). I would like to wrap it around so that when the error bar exits on the right hand side of the plot, it appears on the left hand side where April 2014 is. The specific year doesn't matter (these values are from lots of different years), but specifying year arbitrarily has been the easiest way I've found to get them all on one plot that has a single year on the x-axis. Any help would be greatly appreciated!

This is what I would like the final figure to look like (Figure 3).

My code is the following:

##Import Excel Data
require(xlsx)
require(modeest)
require(ggplot2)
require(ggpubr)
library(scales)

BirthDates300 <- read.xlsx("C:/Users/ZackA/OneDrive - Old Dominion University/frombox/ODU/Sun Bear Weight/Data/data_zd.xlsx", 7)

#Combine Day Month and Year into Date
BirthDates300$MinDate<-as.Date(with(BirthDates300,paste(MinYear,MinMonth,MinDay,sep="-")),"%Y-%m-%d")
BirthDates300$MeanDate<-as.Date(with(BirthDates300,paste(MeanYear,MeanMonth,MeanDay,sep="-")),"%Y-%m-%d")
BirthDates300$MaxDate<-as.Date(with(BirthDates300,paste(MaxYear,MaxMonth,MaxDay,sep="-")),"%Y-%m-%d")
BirthDates300$IndDate<-as.Date(with(BirthDates300,paste(IndYear,IndMonth,IndDay,sep="-")),"%Y-%m-%d")

#Remove unnecessary row 17
BirthDates300 <- BirthDates300[-c(17), ]

#Plotting Range of Birth Dates 300
BirthDatesRange300 <- ggplot()+
  geom_errorbar(data=BirthDates300, mapping=aes(x=MeanDate, xmin=MinDate, xmax=MaxDate, y=CRN), 
                width=0.4, size=1, color="black") +
  geom_point(data=BirthDates300, mapping=aes(x=MeanDate, y=CRN, shape=Sex,), size=4,) +
  geom_point(data=BirthDates300, mapping=aes(x=IndDate, y=CRN, shape=Sex), color="grey", size=4,)+
  labs(title="Sun Bear Estimated Birth Date", subtitle="Assuming 300g at birth")+
  scale_x_date(date_labels="%b",date_breaks  ="1 month",
               limits = as.Date(c('2014-03-25','2015-03-01')))+ 
  scale_y_discrete(limits= c("060-2004", "157-2012", "158-2012", "167-2012", "169-2013", 
                             "202-2017", "207-2019", " ", "002-1999", "058-2004", "073-2006", 
                             "076-2006", "077-2005", "080-2006", "081-2006", "083-2006",
                             "088-2006", "091-2006", "107-2007", "150-2010", "152-2011",
                             "159-2011", "161-2012", "163-2012", "171-2013", "172-2013",
                             "180-2014", "181-2014", "183-2014", "186-2015", "187-2015",
                             "193-2016", "196-2016", "204-2018"))+
  theme(plot.title = element_text(size=16, face="bold", hjust = 0.5),
        plot.subtitle=element_text(size=10, hjust=0.5),
        axis.ticks.y = element_blank(),
        axis.title.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.x = element_text(face="bold", color="black", size=13, vjust=-0.01),
        axis.text.y = element_text(face="bold", color="black", size=10, angle=0),
        panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
        panel.background = element_blank(), axis.line = element_line(colour = "black"),
        panel.border = element_rect(colour = "black", fill=NA, size=0.5),
        legend.title = element_text(size=15),
        plot.margin = margin(10, 10, 15, 10))+
  geom_hline(yintercept=" ", linetype='dotted', col = 'grey', size=1.5)

BirthDatesRange300
  • 2
    Hi! Can you share your dataset so we can try to replicate your issue? My guess is that by using `scale_x_date` you are dropping the points outside of the range (this is the standard way this function works). If you want to constrain the x range without dropping data outside this range, you need to use a different function, e.g. `coord_cartesian` with some special parameters or `tidyquant::coord_dates` (which is a package I just found online). – kikoralston Jul 01 '21 at 15:23
  • 1
    Hi Confused, welcome to Stack Overflow. This question would be substantially improved with 1) a reproducible sample of your data and 2) annotation of how you would like the final result to look. For 1, please [edit] your question with the output of `dput(BirthDates300)` or `dput(head(BirthDates300))` if your data is very large. See [How to make a great R reproducible example](https://stackoverflow.com/a/5963610/) for more. – Ian Campbell Jul 01 '21 at 15:25
  • Kikoralston and Ian, Thanks for the responses! I have edited my post to include my data using dput(). I hope this format is helpful, please let me know if there is anything else I can do to make it more clear! – ConfusedPhDStudent Jul 01 '21 at 15:34
  • I also just added how I would like the final plot to appear (what is circled in red) – ConfusedPhDStudent Jul 01 '21 at 15:39
  • 2
    @ConfusedPhDStudent I think this will be extremely challenging. I'll remind myself to put a bounty on the question in a couple days. – Ian Campbell Jul 01 '21 at 15:53
  • 4
    The "hacky" approach is to duplicate your dataset twice, once adding a year and once subtracting a year, so you have three copies in series. Then when you plot all of them, you can zoom in to just the desired subset, but still have the artifacts from the wrap-over appear. – Brian Jul 01 '21 at 16:06
  • 2
    @Brian I suspect that is as close as the question author will get here. I can post an answer implementing that if you aren't planning to. – Ian Campbell Jul 01 '21 at 16:08
  • 1
    @IanCampbell Please do! – Brian Jul 01 '21 at 16:09
  • 1
    Why did you delete the data from the question? If it's a confidentiality issue, you can make dummy data that still replicates the situation – camille Jul 08 '21 at 15:20
  • @ConfusedPhDStudent, I'm curious if you have any feedback on my answer. Thanks! – Skaqqs Jul 16 '21 at 19:08

2 Answers2

5

A somewhat hacky approach recommended by @Brian in the comments is to duplicate all of the data into the other year.

Here's how you might do that with tidyr::complete and fill:

library(tidyverse)
TargetMinDate <- as.Date("2014-03-25")
TargetMaxDate <- as.Date("2015-03-01")

BirthDates300 %>%
  group_by(CRN, MeanYear) %>% #Group by individual and year
  complete(MeanYear = c(2014,2015)) %>% #Complete the year
  group_by(CRN) %>% #go back to grouping only by individual
  fill(MeanYear:Sex, .direction = "updown") %>% #Fill the other variables from the original row
  mutate(across(contains("Date"), #Fill in the missing date variables
                ~ case_when(is.na(.) & #is NA from the complete operation
                            (MinDate[!is.na(MinDate)][1] < TargetMinDate | #MinDate is less than the target MinDate
                            MaxDate[!is.na(MaxDate)][1] > TargetMaxDate) #or MaxDate is greater than the target MaxDate
                            ~ as.Date(str_replace(.[!is.na(.)][1],"20\\d{2}",as.character(MeanYear))), #Then replace the year with the year we filled earlier
                            TRUE ~ .))) -> BirthDates300Duplicate #Otherwise, no change, then assign to variable

From here we just need to move the xlimit into coord_cartesian and set clip = off:

ggplot()+
  geom_errorbar(data=BirthDates300Duplicate, mapping=aes(x=MeanDate, xmin=MinDate, xmax=MaxDate, y=CRN), 
                width=0.4, size=1, color="black") +
  geom_point(data=BirthDates300Duplicate, mapping=aes(x=MeanDate, y=CRN, shape=Sex,), size=4,) +
  geom_point(data=BirthDates300Duplicate, mapping=aes(x=IndDate, y=CRN, shape=Sex), color="grey", size=4,)+
  labs(title="Sun Bear Estimated Birth Date", subtitle="Assuming 300g at birth")+
  coord_cartesian(xlim = as.Date(c(TargetMinDate,TargetMaxDate))) +
  scale_x_date(date_labels="%b",date_breaks  ="1 month")+ 
  scale_y_discrete(limits= c("060-2004", "157-2012", "158-2012", "167-2012", "169-2013", 
                             "202-2017", "207-2019", " ", "002-1999", "058-2004", "073-2006", 
                             "076-2006", "077-2005", "080-2006", "081-2006", "083-2006",
                             "088-2006", "091-2006", "107-2007", "150-2010", "152-2011",
                             "159-2011", "161-2012", "163-2012", "171-2013", "172-2013",
                             "180-2014", "181-2014", "183-2014", "186-2015", "187-2015",
                             "193-2016", "196-2016", "204-2018"))+
  theme(plot.title = element_text(size=16, face="bold", hjust = 0.5),
        plot.subtitle=element_text(size=10, hjust=0.5),
        axis.ticks.y = element_blank(),
        axis.title.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.x = element_text(face="bold", color="black", size=13, vjust=-0.01),
        axis.text.y = element_text(face="bold", color="black", size=10, angle=0),
        panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
        panel.background = element_blank(), axis.line = element_line(colour = "black"),
        panel.border = element_rect(colour = "black", fill=NA, size=0.5),
        legend.title = element_text(size=15),
        plot.margin = margin(10, 10, 15, 10))+
  geom_hline(yintercept=" ", linetype='dotted', col = 'grey', size=1.5)  

enter image description here

Like I suspect you would be, I am interested in a more robust approach.

Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
  • Thank you so much Brian and Ian!! That's exactly what I want my figure to look like. I am working through some novice R user issues right now trying to get all the functions to work in your first set of code, but I'm sure I'll get it to work eventually. I really appreciate the help!! – ConfusedPhDStudent Jul 01 '21 at 16:59
0

Here's another approach but using base graphics.

The key here is to use xaxs = "i" in plot(), which removes the default 6% buffer between the first and last x-axis ticks and the plotting area. I plotted your data as Julian day on the x-axis and placed the month labels at about the 15th date of each month.

I can edit to show how to shift dates on the x-axis to map to Apr - Mar if you like.

# Get fields of interest
foi <- c("CRN", "Sex", "MinDate", "MaxDate", "MeanDate", "IndDate")
dat <- original.data[!is.na(original.data$MinDate), foi]

# Format fields
dat[,3:6] <- lapply(dat[,3:6], as.Date, format = "%Y-%m-%d")
dat$start <- as.numeric(format(dat$MinDate, "%j"))
dat$end <- as.numeric(format(dat$MaxDate, "%j"))
dat$mid <- as.numeric(format(dat$MeanDate, "%j"))
dat$ind <- as.numeric(format(dat$IndDate, "%j"))
dat$CRN <- paste(dat$CRN)
dat$Sex <- paste(dat$Sex)

# Order CRN to match example in question
crn.order <- c("060-2004", "157-2012", "158-2012", "167-2012", "169-2013", 
               "202-2017", "207-2019", "002-1999", "058-2004", "073-2006", 
               "076-2006", "077-2005", "080-2006", "081-2006", "083-2006",
               "088-2006", "091-2006", "107-2007", "150-2010", "152-2011",
               "159-2011", "161-2012", "163-2012", "171-2013", "172-2013",
               "180-2014", "181-2014", "183-2014", "186-2015", "187-2015",
               "193-2016", "196-2016", "204-2018")
dat <- dat[order(match(dat$CRN,crn.order)),]
dat$bear_id <- 1:nrow(dat)

# Make timeseries; assume 366 julian days per year
ts <- data.frame(matrix(NA, nrow = 366, ncol = nrow(dat)))
names(ts) <- 1:32
jdays <- 1:366
for(i in 1:nrow(dat)){
  if(dat[i,"start"] < dat[i,"end"]){
    birthmos <- jdays[jdays >= dat[i,"start"] & jdays <= dat[i,"end"]]
  } else if(dat[i, "start"] > dat[i, "end"]){
    jdays1 <- jdays[jdays >= dat[i, "start"]]
    jdays2 <- jdays[jdays <= dat[i, "end"]]
    birthmos <- c(jdays1, jdays2)
  } else if(dat[i, "start"] == dat[i, "end"]){
    birthmos <- dat[i,"start"]
  }
  ts[,i] <- ifelse(jdays %in% birthmos, i, NA)
}

mat <- as.matrix(ts)
mat

# Plot settings
shapes <- ifelse(dat$Sex == "F", 16,
                 ifelse(dat$Sex == "M", 17,
                        ifelse(dat$Sex == "NA", 15, 1)))
month.abb <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
               "Jul", "Aug", "Sept", "Oct", "Nov", "Dec")
# 15th date of each month in julian day
jday.15 <- c(15, 46, 74, 105, 135, 166, 196, 227, 258, 288, 319, 349) 

par(mar = c(5,5,5,8))
plot(1, type = "n", xlim = c(1, 366), ylim = c(1, 32),
     yaxt = "n", xaxt = "n", xaxs = "i", ann = FALSE)
matlines(y = mat, x = 1:nrow(mat), lty = 1, col = "black", lwd = 2)
points(x = dat$mid, y = dat$bear_id, pch = shapes, col = "black", cex = 1.5)
points(x = dat$ind, y = dat$bear_id, pch = shapes, col = "grey40", cex = 1.5)
lines(y = c(4.5,4.5), x = c(0, 366), col = "grey40", lty = 2)
axis(1, at = jday.15, labels = month.abb, font = 2)
axis(2, at = 1:32, labels = dat$CRN, las = 2,
     cex.axis = 0.8, lwd.ticks = 0, font = 2)
title(main = "Sun Bear Estimated Birth Date")
mtext(side=3, at = 366/2, line=0.5, cex=0.8, "Assuming 300g at birth")
legend(380, 25, title = "Sex", legend = c("F", "M", "NA"), col = "grey40",
       bty = "n", pch = c(16, 17, 15), xpd = TRUE, cex = 1)

bear wrap

Skaqqs
  • 4,010
  • 1
  • 7
  • 21
  • You may access the example data by reviewing the question [revision history](https://stackoverflow.com/posts/68212245/revisions). – Ian Campbell Jul 08 '21 at 19:52