0

Context

List1 is a list object with 3 data.frames that have 2 date columns. I am trying to find the number of months between date1 and date2. Test data and my attempted solution with lapply are below. I believe the if statement in the nested lapply is necessary because seq.Date fails if the 'to' date is before the 'from' date.

However, my current implementation gives me the following error:

Error: unexpected '}' in "    }"

Reading this detailed response indicates that several things can give that error message, but I don't think my lapply function has those.

I have previously implemented this in a for loop, but trying to learn how to convert for loops to lapply in my R code and work with lists.

Reproducible data

set.seed(3)
sim_list = replicate(n = 3,
                     expr = {data.frame(date1 = sample(x = 1:12, size = 10), date2 = sample(x = 1:12, size = 10))},
                     simplify = F)

list1 <- lapply(sim_list, function(x) {
  x[['date1']] = as.Date(paste('01', x[['date1']], '2016', sep = '-'), format = '%d-%m-%Y')
  x[['date2']] = as.Date(paste('01', x[['date2']], '2016', sep = '-'), format = '%d-%m-%Y')
  return(x)
})

Example of expected output

> list1[[1]]
        date1      date2 elapsed_months
1  2016-03-01 2016-07-01              4
2  2016-09-01 2016-06-01              3
3  2016-04-01 2016-11-01              7
4  2016-12-01 2016-10-01              2
5  2016-05-01 2016-12-01              7
6  2016-08-01 2016-09-01              1
7  2016-01-01 2016-01-01              0
8  2016-02-01 2016-04-01              2
9  2016-11-01 2016-05-01              6
10 2016-07-01 2016-08-01              1

The troublesome lapply implementation

lapply(list1, function(x)
  lapply(x, function(y) {
    if (y['date2'] > y['date1'] == T) {
      y['elapsed_months'] = length(seq.Date(from = y['date1'], to = y['date2'], by = 'month')) - 1
    } else {
      y['elapsed_months'] = length(seq.Date(from = y['date2'], to = y['date1'], by = 'month')) - 1
    }
  }))

Thanks for reading!

2 Answers2

1

I was not able to get your reproducible results to work, but I assumed you were looking for something like this.

set.seed(3)
sim_list = replicate(n = 3, expr = {data.frame(date1 = sample(x = 1:12, size = 10), date2 = sample(x = 1:12, size = 10))},
                     simplify = F)  
list1 <- lapply(sim_list, function(x) {
  x['date1'] = as.Date(paste('01', unlist(x['date1']), '2016', sep = '-'), format = '%d-%m-%Y')
  x['date2'] = as.Date(paste('01', unlist(x['date2']), '2016', sep = '-'), format = '%d-%m-%Y')
  return(x)
})



lapply(list1, function(x){
  x['elapsed_months'] <- apply(x, 1,  function(y){
    abs(as.POSIXlt(as.Date(y['date1']))$mon-as.POSIXlt(as.Date(y['date2']))$mon)
  })
  x
})
smanski
  • 541
  • 2
  • 7
1

We can use difftime to calculate difference between the two dates in days and then divide it by 30 to get month.

lapply(list1, function(x) cbind(x, elapsed_months = 
         as.numeric(round(abs(difftime(x$date2,x$date1, units = "days")/30)))))

#[[1]]
#        date1      date2 elapsed_months
#1  2016-03-01 2016-07-01         4
#2  2016-09-01 2016-06-01         3
#3  2016-04-01 2016-11-01         7
#4  2016-12-01 2016-10-01         2
#5  2016-05-01 2016-12-01         7
#6  2016-08-01 2016-09-01         1
#7  2016-01-01 2016-01-01         0
#8  2016-02-01 2016-04-01         2
#9  2016-11-01 2016-05-01         6
#10 2016-07-01 2016-08-01         1

#[[2]]
#        date1      date2 elapsed_months
#1  2016-03-01 2016-05-01         2
#2  2016-01-01 2016-12-01        11
#3  2016-02-01 2016-02-01         0
#4  2016-11-01 2016-11-01         0
#5  2016-10-01 2016-03-01         7
#6  2016-06-01 2016-08-01         2
#7  2016-04-01 2016-06-01         2
#8  2016-05-01 2016-10-01         5
#9  2016-12-01 2016-07-01         5
#10 2016-07-01 2016-01-01         6

#[[3]]
#        date1      date2 elapsed_months
#1  2016-04-01 2016-03-01         1
#2  2016-09-01 2016-12-01         3
#3  2016-02-01 2016-09-01         7
#4  2016-06-01 2016-10-01         4
#5  2016-12-01 2016-07-01         5
#6  2016-10-01 2016-08-01         2
#7  2016-01-01 2016-11-01        10
#8  2016-11-01 2016-02-01         9
#9  2016-07-01 2016-01-01         6
#10 2016-03-01 2016-04-01         1
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 1
    Hi Ronak, I've marked yours as the answer now. Your code is much faster as the list size increases. I've replicated the list 100 times, and there are 500 rows in each data.frame now. The elapsed time for your code is 0.03s whereas @smankski takes 5.59 s. – Prateek Sharma Mar 22 '18 at 13:55