2

Long time lurker, first time poster and still a bumbling R beginner.
I would like some help jittering months in R – Jitter may not be the best description??

The full data set i am working with consists of 10,000 rows x 30 columns. The data set contains 40 sites with start dates for each site ranging from 1986 to 2012, with monthly samples collected (at each site) up to Dec 2015. There are missing dates (samples) but these are not represented in the dataset. Therefore for any given site, there may or may not be 12 months (samples) per year.

Below is an example data set and the desired dates i am after would look like the req.date data frame which have sequential months

wq <- data.frame(site = c(rep("A", 5), rep("B", 5)), 
             date = as.Date(c("23/06/2012", "01/07/2012", "26/07/2012", 
              "05/09/2012", "23/10/2012", "01/04/2016", "08/05/2016", 
              "01/07/2016", "30/07/2016", "05/08/2016"), format = "%d/%m/%Y"), 
             year = c(rep("2012", 5), rep("2016", 5)),
             month = c(6, 7, 7, 9, 10, 4, 5, 7 , 7, 8))


req.date <- data.frame(req.date = 
                     as.Date(c("23/06/2012", "01/07/2012", "26/08/2012", 
                     "05/09/2012", "23/10/2012", "01/04/2016", "08/05/2016", 
                     "01/06/2016", "30/07/2016", "05/08/2016"), format = "%d/%m/%Y"))

I created the month and year columns so people could understand my question and are not necessary for my final data set.

What i would like to know is how to “jitter” the month part of wq$date (by +/- one month) where a month is duplicated. I am only interested in the adjusting months and I am not so concerned about the exact day.

I found this add.Month function (Add a month to a Date), but would appreciate help with a function to adjust wq$date accounting for the month in the rows above and below the duplicated date

I have ID-ed the duplicated date(s) grouped by site and year

wq$dup <- duplicated(wq[ ,c(1,3,4)])

But now I am are unsure how to proceed with a function to do the last step. I will use my really poor R coding skills to attempt a solution (and I apologies for my lack of skill here!)

#use wq$month to make it easier to make comparisons 
wq$new.date <- ifelse wq$dup ="TRUE", c <- wq$month - (nrow(wq$month) -2)
ifelse c = 1, wq$date <- month(wq$date) + 1,
# if the diff btw the duplicate date/month is 1 month more than the month value located 2 rows up, then the 
# duplicate month needs +1 month
ifelse c = 2, (nrow(wq$date) +1) <- month(wq$date) - 1
# if the diff btw the duplicate date/month is 2 months more than the month value located 2 rows up, then the 
# month above the duplicate month needs -1 month
else wq$date

Any help would be greatly appreciated!

Updated:

I need to ID the duplicate months (which i have done) then look at the sequence of months within the year to determine if the duplicate month needs to be adjusted (by +/- 1 month) to complete the month sequence for that particular year. e.g. from the above data frame and using site A. The month duplicate is 01/07/2012 and 26/07/2013. The month sequence for site A is currently (6, 7, 7, 8, 9). The correct month sequence should be (6, 7, 8, 9, 10). For site B the month duplicate is 01/07/2016 and 30/07/2016. The month sequence for site B is currently (4, 5, 7, 7, 8). The correct month sequence should be (4, 5, 6, 7, 8). I'm in need of a function to correct the month sequences.

Community
  • 1
  • 1
Evechinus
  • 23
  • 3
  • You want a duplication function concerning the date (YYYY-MM) which have a tolerance of 1 months. Is it this what you want? – and-bri Oct 21 '16 at 12:13
  • @and thanks for the reply. I have update by question. Hopefully this will clarify what i am chasing..Thanks – Evechinus Oct 26 '16 at 07:40
  • okay...but how can you make from The month sequence for site A, which is currently (6, 7, 7, 8, 9). The correct month sequence, which should be (6, 7, 8, 9, 10)....why 10?? it is not only one month shift, you like to shift the duplicate for for 3 month. And why it is not shifted at the beginning of the sequence (5, 6, 7, 8, 9) which would mean a shift of 2 month – and-bri Oct 26 '16 at 09:46

1 Answers1

0

This code search all dates which contains the same year and month (duplicates). After it check if there is no sample one month before and after the duplicate in the survey. If there is a sample missing the duplicate date-value got replaced by the first missing month.

add.months= function(date,n) seq(date, by = paste (n, "months"), length = 2)[2]

months <- substr(wq[,2], 1,7)
pos <- which(duplicated(months))

for(z in pos){
  neighbour <- 
    format(
      c(
        add.months(wq[z,2], -1),
        add.months(wq[z,2], 1)
      ),
      format="%Y-%m-%d")

  if(sum(!substr(neighbour,1,7) %in% months) >= 1){
    wq[z,2] <- neighbour[which(!substr(neighbour,1,7) %in% months)[1]]
  }
}

wq <- wq[with(wq, order(site, date)),]
and-bri
  • 1,563
  • 2
  • 19
  • 34
  • balls sorry...major typo in my last update this is how it should be: The month duplicate is 01/07/2012 and 26/07/2013. The month sequence for site A is currently (6, 7, 7, 9, 10). The correct month sequence should be (6, 7, 8, 9, 10). For site B the month duplicate is 01/07/2016 and 30/07/2016. The month sequence for site B is currently (4, 5, 7, 7, 8). The correct month sequence should be (4, 5, 6, 7, 8). – Evechinus Oct 26 '16 at 23:14
  • your second answer solves for date/month situation for Site A, but for Site B the month sequence ends up as (4, 5, 7, 6, 8)....real close!! Thanks your help (and patience!), – Evechinus Oct 26 '16 at 23:17
  • By the way we don't pay attention on the side. If there are two months but on different sides this script will treat as at duplicate. Is this case in your data? – and-bri Oct 27 '16 at 10:25
  • perfect! For a complete answer the add.months function you had in your very first answer is also needed for the above code to run?? Is this right? Just for anyone else wanting to use your answer?? Thank you! – Evechinus Nov 01 '16 at 22:42