1

I have the following df

Id   a_min_date      a_max_date      b_min_date     b_max_date       c_min_date       c_max_date           d_min_date     a_max_date
1    2014-01-01      2014-01-10      2014-01-05     2014-01-15            NA               NA              2014-02-20       2014-05-01
2    2014-02-01      2014-02-10       NA              NA               2015-02-20       2015-03-01             NA               NA    

I have added the intervals of each group (a, b, c,d) by ID. First, I have converted the start and end dates to lubridate intervals. I want to plot the intervals and calculate the time difference in days between the end of each group and the start of next group if there is no overlap. I tried to use IRanges package and converted the dates into integers (as used here (link)), but does not work for me.

ir <- IRanges::IRanges(start = as.integer((as.Date(df$a_min_date))), end = as.integer((as.Date(df$a_max_date))))
bins <- disjointBins(IRanges(start(ir), end(ir) + 1))
dat <- cbind(as.data.frame(ir), bin = bins)

ggplot(dat) + 
  geom_rect(aes(xmin = start, xmax = end,
                ymin = bin, ymax = bin + 0.9)) +
  theme_bw()

I got this error for my orginal df:

Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : 
  solving row 1: range cannot be determined from the supplied arguments (too many NAs)

Does someone have another solution using other packages?

A1976
  • 39
  • 6
  • Could you please provide a sample of the output? – akash87 Mar 21 '19 at 19:13
  • The figure that ggplo2 has created was empty. I have no real output. I want just to plot this intervals and see if there is any overlap. – A1976 Mar 21 '19 at 19:21

1 Answers1

0

To my knowledge, IRanges is the best package out there to solve this problem. IRanges needs range values (in this case dates) to compare and does not handle undefined values (NAs)

To solve this problem, I would remove all rows with NAs in df before doing the analysis.

df <- df[complete.cases(df[ , 1:2]),]

Explanation and other ways to remove NAs see Remove rows with all or some NAs (missing values) in data.frame.

If this does not fix the problem, you could convert the dates into integers. Important there is that the dates have the year-month-day format to result in correct intervals.

Example:

str <- "2006-06-26"


splitted<- unlist(strsplit(str,"-"))
[1] "2006" "06"   "26"

result <- paste(splitted,collapse="")
[1] "20060626"
scs
  • 567
  • 6
  • 22
  • I have exclude NAs, but did not help. Is there any other method using dplyr? – A1976 Mar 21 '19 at 23:44
  • I would simply convert the strings to integer for analysis (see example in my modified answer). – scs Mar 22 '19 at 08:54