Lapply to determine months between columns in a list (of data.frames)

Question

Context

List1 is a list object with 3 data.frames that have 2 date columns. I am trying to find the number of months between date1 and date2. Test data and my attempted solution with lapply are below. I believe the if statement in the nested lapply is necessary because seq.Date fails if the 'to' date is before the 'from' date.

However, my current implementation gives me the following error:

Error: unexpected '}' in "    }"

Reading this detailed response indicates that several things can give that error message, but I don't think my lapply function has those.

I have previously implemented this in a for loop, but trying to learn how to convert for loops to lapply in my R code and work with lists.

Reproducible data

set.seed(3)
sim_list = replicate(n = 3,
                     expr = {data.frame(date1 = sample(x = 1:12, size = 10), date2 = sample(x = 1:12, size = 10))},
                     simplify = F)

list1 <- lapply(sim_list, function(x) {
  x[['date1']] = as.Date(paste('01', x[['date1']], '2016', sep = '-'), format = '%d-%m-%Y')
  x[['date2']] = as.Date(paste('01', x[['date2']], '2016', sep = '-'), format = '%d-%m-%Y')
  return(x)
})

Example of expected output

> list1[[1]]
        date1      date2 elapsed_months
1  2016-03-01 2016-07-01              4
2  2016-09-01 2016-06-01              3
3  2016-04-01 2016-11-01              7
4  2016-12-01 2016-10-01              2
5  2016-05-01 2016-12-01              7
6  2016-08-01 2016-09-01              1
7  2016-01-01 2016-01-01              0
8  2016-02-01 2016-04-01              2
9  2016-11-01 2016-05-01              6
10 2016-07-01 2016-08-01              1

The troublesome lapply implementation

lapply(list1, function(x)
  lapply(x, function(y) {
    if (y['date2'] > y['date1'] == T) {
      y['elapsed_months'] = length(seq.Date(from = y['date1'], to = y['date2'], by = 'month')) - 1
    } else {
      y['elapsed_months'] = length(seq.Date(from = y['date2'], to = y['date1'], by = 'month')) - 1
    }
  }))

Thanks for reading!

Also, your reproducible data is not reproducible and throws an error. — Maurits Evers, Mar 22 '18 at 04:08
Possible duplicate of [Number of months between two dates](https://stackoverflow.com/questions/1995933/number-of-months-between-two-dates) — Maurits Evers, Mar 22 '18 at 04:14
@MauritsEvers Thanks for sharing that relevant post, I actually implemented Dominic's approach in my lapply function. My question isn't necessarily how to find the difference between dates, but how to generalize that to more than one data.frame. — Prateek Sharma, Mar 22 '18 at 04:53

score 1 · Answer 1 · answered Mar 22 '18 at 04:20

I was not able to get your reproducible results to work, but I assumed you were looking for something like this.

set.seed(3)
sim_list = replicate(n = 3, expr = {data.frame(date1 = sample(x = 1:12, size = 10), date2 = sample(x = 1:12, size = 10))},
                     simplify = F)  
list1 <- lapply(sim_list, function(x) {
  x['date1'] = as.Date(paste('01', unlist(x['date1']), '2016', sep = '-'), format = '%d-%m-%Y')
  x['date2'] = as.Date(paste('01', unlist(x['date2']), '2016', sep = '-'), format = '%d-%m-%Y')
  return(x)
})



lapply(list1, function(x){
  x['elapsed_months'] <- apply(x, 1,  function(y){
    abs(as.POSIXlt(as.Date(y['date1']))$mon-as.POSIXlt(as.Date(y['date2']))$mon)
  })
  x
})

Well this works, doesn't even need the nested if statement! – Prateek Sharma Mar 22 '18 at 05:25 — Prateek Sharma, Mar 22 '18 at 05:25

score 1 · Accepted Answer · answered Mar 22 '18 at 05:27

We can use difftime to calculate difference between the two dates in days and then divide it by 30 to get month.

lapply(list1, function(x) cbind(x, elapsed_months = 
         as.numeric(round(abs(difftime(x$date2,x$date1, units = "days")/30)))))

#[[1]]
#        date1      date2 elapsed_months
#1  2016-03-01 2016-07-01         4
#2  2016-09-01 2016-06-01         3
#3  2016-04-01 2016-11-01         7
#4  2016-12-01 2016-10-01         2
#5  2016-05-01 2016-12-01         7
#6  2016-08-01 2016-09-01         1
#7  2016-01-01 2016-01-01         0
#8  2016-02-01 2016-04-01         2
#9  2016-11-01 2016-05-01         6
#10 2016-07-01 2016-08-01         1

#[[2]]
#        date1      date2 elapsed_months
#1  2016-03-01 2016-05-01         2
#2  2016-01-01 2016-12-01        11
#3  2016-02-01 2016-02-01         0
#4  2016-11-01 2016-11-01         0
#5  2016-10-01 2016-03-01         7
#6  2016-06-01 2016-08-01         2
#7  2016-04-01 2016-06-01         2
#8  2016-05-01 2016-10-01         5
#9  2016-12-01 2016-07-01         5
#10 2016-07-01 2016-01-01         6

#[[3]]
#        date1      date2 elapsed_months
#1  2016-04-01 2016-03-01         1
#2  2016-09-01 2016-12-01         3
#3  2016-02-01 2016-09-01         7
#4  2016-06-01 2016-10-01         4
#5  2016-12-01 2016-07-01         5
#6  2016-10-01 2016-08-01         2
#7  2016-01-01 2016-11-01        10
#8  2016-11-01 2016-02-01         9
#9  2016-07-01 2016-01-01         6
#10 2016-03-01 2016-04-01         1

Hi Ronak, I've marked yours as the answer now. Your code is much faster as the list size increases. I've replicated the list 100 times, and there are 500 rows in each data.frame now. The elapsed time for your code is 0.03s whereas @smankski takes 5.59 s. — Prateek Sharma, Mar 22 '18 at 13:55

Lapply to determine months between columns in a list (of data.frames)

Context

Reproducible data

Example of expected output

The troublesome lapply implementation

2 Answers2