R: Aggregating between dates without for loop

Question

I am looking to sum over all rent earned on leases that were active between two dates without using a for loop.

Here is a sample of the lease data
DataFrame1

StartDate     EndDate       MonthlyRental  
2015-07-01    2015-09-30    500
2015-06-01    2015-10-31    600
2015-07-15    2016-01-31    400
2015-08-01    2015-12-31    800

I would like to calculate the amount of rent I would get for each month, pro-rata'ed if possible (not NB if too difficult). For example:
DataFrame2

Month        RentalIncome
2015-07-31   500+600+(400*15/31)
2015-08-31   500+600+400+800
2015-09-30   500+600+400+800
2015-10-31   600+400+800
2015-11-30   600+400+800
etc.

Does anyone know of a better way of doing this than simply looping through Dataframe2?

Thanks,

Mike

You are currently looping through Dataframe1, not Dataframe 2 (as you wrote). Correct? Please post your current code to transform Dataframe1 to Dataframe2. — CL., Jul 07 '15 at 10:57

David Arenburg · Answer 1 · 2015-07-07T12:01:25.637

Here's a possible data.table solution (with some help from the Hmisc package). This could be potentially a very easy question if there were no half month rentals, but because of that constraint it became albeit difficult.

As a side note, I've only assumed half months in StartDate as per your example

library(data.table)
require(Hmisc)

# Converting to valid date classes
Dates <- names(df)[1:2]
setDT(df)[, (Dates) := lapply(.SD, as.Date), .SDcols = Dates]

# Handling half months
df[mday(StartDate) != 1, `:=`(GRP = seq_len(.N), 
                              mDays = mday(StartDate), 
                              StartDate = StartDate - mday(StartDate) + 1L)]

## Converting to long format
res <- df[, .(Month = seq(StartDate, EndDate, by = "month")), 
              by = .(MonthlyRental, GRP, mDays)]

## Dividing not full months by the number of days (that could be modified as per other post)
res[match(na.omit(df$GRP), GRP), MonthlyRental := MonthlyRental*mDays/monthDays(Month)]
res[, .(RentalIncome = sum(MonthlyRental)), keyby = .(year(Month), month(Month))]

#    year month RentalIncome
# 1: 2015     6          600
# 2: 2015     7         1293
# 3: 2015     8         2300
# 4: 2015     9         2300
# 5: 2015    10         1800
# 6: 2015    11         1200
# 7: 2015    12         1200
# 8: 2016     1          400

mra68 · Accepted Answer · 2015-07-29T09:17:59.413

I modified my previous answer a little bit. The matrix "RentPerDay" is not necessary. "colSums(t(countDays)*RentPerDay)" can be replaced by a matrix-vector-product. This solution calculates the same rental income as the previous solution.

library(lubridate)

ultimo_day <- function( start, end )
{
  N <- 12*(year(end) - year(start)) + month(end) - month(start) + 1
  d <- start
  day(d) <- 1
  month(d) <- month(d) + (1:N)
  return( d - as.difftime(1,units="days"))
}

countDays <- function( data, d )
{
  return( pmin( pmax( outer( d, data$"StartDate", "-") + 1, 0 ), day(d) ) -
          pmin( pmax( outer( d, data$"EndDate"  , "-"), 0 ), day(d) ) )
}

rentalIncome <- function( data,
                          d = ultimo_day( min(data$StartDate), max(data$EndDate) ) )
{
  return ( data.frame( date   = d,
                       income = ( countDays(data,d) / days_in_month(d) ) %*% data$"MonthlyRental" ) )
}

# -------- Example Data: --------

df1 <- data.frame(
  StartDate     = as.Date(c("2015-07-01", "2015-06-01", "2015-07-15", "2015-08-01", "2014-06-20")),
  EndDate       = as.Date(c("2015-09-30", "2015-10-31", "2016-01-31", "2015-12-31", "2015-07-31")),
  MonthlyRental = c(500, 600, 400, 800, 300)
)

To the example I added one more lease, which is active for more than one year:

> df1
   StartDate    EndDate MonthlyRental
1 2015-07-01 2015-09-30           500
2 2015-06-01 2015-10-31           600
3 2015-07-15 2016-01-31           400
4 2015-08-01 2015-12-31           800
5 2014-06-20 2015-07-31           300

"ultimo_day(start,end)" is the vector of days between "start" and "end" on which rent is payed:

> d <- ultimo_day( min(df1$StartDate), max(df1$EndDate))
> d
 [1] "2014-06-30" "2014-07-31" "2014-08-31" "2014-09-30" "2014-10-31" "2014-11-30" "2014-12-31" "2015-01-31" "2015-02-28" "2015-03-31" "2015-04-30"
[12] "2015-05-31" "2015-06-30" "2015-07-31" "2015-08-31" "2015-09-30" "2015-10-31" "2015-11-30" "2015-12-31" "2016-01-31"

The rows of the matrix "countDays" correspond to these ultimo days and therefore to the months:

> countDays(df1,d)
Time differences in days
      [,1] [,2] [,3] [,4] [,5]
 [1,]    0    0    0    0   11
 [2,]    0    0    0    0   31
 [3,]    0    0    0    0   31
 [4,]    0    0    0    0   30
 [5,]    0    0    0    0   31
 [6,]    0    0    0    0   30
 [7,]    0    0    0    0   31
 [8,]    0    0    0    0   31
 [9,]    0    0    0    0   28
[10,]    0    0    0    0   31
[11,]    0    0    0    0   30
[12,]    0    0    0    0   31
[13,]    0   30    0    0   30
[14,]   31   31   17    0   31
[15,]   31   31   31   31    0
[16,]   30   30   30   30    0
[17,]    0   31   31   31    0
[18,]    0    0   30   30    0
[19,]    0    0   31   31    0
[20,]    0    0   31    0    0

Row 1 belongs to June 2014, Row 2 to July 2014,..., Row 20 to January 2016.

"countDays(df1,d) / days_in_month(d)" is again a matrix. The (i,j)-component of this matrix is not the number of days the j-th lease is active in the i-th month, but the fraction of this number by the length of the i-th month:

> countDays(df1,d) / days_in_month(d)
Time differences in days
      [,1] [,2]      [,3] [,4]      [,5]
 [1,]    0    0 0.0000000    0 0.3666667
 [2,]    0    0 0.0000000    0 1.0000000
 [3,]    0    0 0.0000000    0 1.0000000
 [4,]    0    0 0.0000000    0 1.0000000
 [5,]    0    0 0.0000000    0 1.0000000
 [6,]    0    0 0.0000000    0 1.0000000
 [7,]    0    0 0.0000000    0 1.0000000
 [8,]    0    0 0.0000000    0 1.0000000
 [9,]    0    0 0.0000000    0 1.0000000
[10,]    0    0 0.0000000    0 1.0000000
[11,]    0    0 0.0000000    0 1.0000000
[12,]    0    0 0.0000000    0 1.0000000
[13,]    0    1 0.0000000    0 1.0000000
[14,]    1    1 0.5483871    0 1.0000000
[15,]    1    1 1.0000000    1 0.0000000
[16,]    1    1 1.0000000    1 0.0000000
[17,]    0    1 1.0000000    1 0.0000000
[18,]    0    0 1.0000000    1 0.0000000
[19,]    0    0 1.0000000    1 0.0000000
[20,]    0    0 1.0000000    0 0.0000000

This matrix is multiplied by the vector "df1$MonthlyRental" and the resulting vector is stored as "income" in the data.frame of rental income:

> rentalIncome(df1)
         date   income
1  2014-06-30  110.000
2  2014-07-31  300.000
3  2014-08-31  300.000
4  2014-09-30  300.000
5  2014-10-31  300.000
6  2014-11-30  300.000
7  2014-12-31  300.000
8  2015-01-31  300.000
9  2015-02-28  300.000
10 2015-03-31  300.000
11 2015-04-30  300.000
12 2015-05-31  300.000
13 2015-06-30  900.000
14 2015-07-31 1619.355
15 2015-08-31 2300.000
16 2015-09-30 2300.000
17 2015-10-31 1800.000
18 2015-11-30 1200.000
19 2015-12-31 1200.000
20 2016-01-31  400.000

score 0 · Answer 3 · answered Jul 07 '15 at 11:39

I'm not sure if this is better than "simply looping through the dataframe" - because I actually do loop through it - but here's a way to produce the desired output.

(The output deviates from the question in July 2015 because rent is to be paid for 17 days in July, not 15.)

The given intervals are translated into days, the rent per day is calculated and then rents per days are summed by month:

library(zoo)

df1 <- data.frame(
  StartDate = as.Date(c("2015-07-01", "2015-06-01", "2015-07-15", "2015-08-01")),
  EndDate = as.Date(c("2015-09-30", "2015-10-31", "2016-01-31", "2015-12-31")),
  MonthlyRental = c(500, 600, 400, 800)
)

df1LongList <- apply(df1, MARGIN = 1, FUN = function(row) {
  return(data.frame(
    date = seq(from = as.Date(row["StartDate"]), to = as.Date(row["EndDate"]), by = "day"),
    MonthlyRental = as.numeric(row["MonthlyRental"])))
})

df1Long <- do.call("rbind", df1LongList)
df1Long$yearMon <- as.yearmon(df1Long$date)
df1Long$maxDays <- as.numeric(as.Date(df1Long$yearMon, frac = 1) - as.Date(df1Long$yearMon) + 1) # Thanks: http://stackoverflow.com/a/6244503/2706569

df1Long$rental <- df1Long$MonthlyRental / df1Long$maxDays

tapply(X = df1Long$rental, INDEX = df1Long$yearMon, FUN = sum)

# Jun 2015 Jul 2015 Aug 2015 Sep 2015 Okt 2015 Nov 2015 Dez 2015 Jan 2016 
# 600.000 1319.355 2300.000 2300.000 1800.000 1200.000 1200.000  400.000

It does (I think). Have you seen my remark above the code? In other words: I don't consider "half" months, but rather calculate in exact days. — CL., Jul 07 '15 at 11:49
When the contract starts at day 15, you will have to pay for that day, right? 31 - 15 + 1 = 17. — CL., Jul 07 '15 at 11:53

score 0 · Answer 4 · answered Jul 08 '15 at 01:51

I used outer products, 'pmin', and 'pmax' to avoid looping. Difficult and therefore interesting are the partially covered months:

library(lubridate)

df1 <- data.frame(
  StartDate = as.Date(c("2015-07-01", "2015-06-01", "2015-07-15", "2015-08-01")),
  EndDate = as.Date(c("2015-09-30", "2015-10-31", "2016-01-31", "2015-12-31")),
  MonthlyRental = c(500, 600, 400, 800)
)

d <- c( as.Date("2015-07-31"),
        as.Date("2015-08-31"),
        as.Date("2015-09-30"),
        as.Date("2015-10-31"),
        as.Date("2015-11-30"),
        as.Date("2015-12-31"),
        as.Date("2016-01-31"),
        as.Date("2016-02-29")  )

RentPerDay <- outer( df1$"MonthlyRental", days_in_month(d), "/" )

countDays <- pmin( pmax( outer( d, df1$"StartDate", "-") + 1, 0 ), days_in_month(d) ) -
             pmin( pmax( outer( d, df1$"EndDate"  , "-"), 0 ), days_in_month(d) )

rentalIncome <- colSums( t(countDays) * RentPerDay )

The columns of the matrix 't(countDays)' correspond to the rows of 'DataFrame_2', i.e. to the months. The rows correspond to the rows of 'DataFrame_1', i.e. to the sources of rental income. The entry at (i,j) is the number of days in the j-th month, for which the i-th source contributes to the rental income. The matrix 'RentPerDay' has the same structure. The entry at (i,j) is the amount of money coming from the i-th source for one day in the j-th month. Then summation over the j-th column of the elementwise product of these two matrices is the total rental income in the j-th month.

> t(countDays)
Time differences in days
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]   31   31   30    0    0    0    0    0
[2,]   31   31   30   31    0    0    0    0
[3,]   17   31   30   31   30   31   31    0
[4,]    0   31   30   31   30   31    0    0
> RentPerDay
          Jul      Aug      Sep      Oct      Nov      Dec      Jan      Feb
[1,] 16.12903 16.12903 16.66667 16.12903 16.66667 16.12903 16.12903 17.24138
[2,] 19.35484 19.35484 20.00000 19.35484 20.00000 19.35484 19.35484 20.68966
[3,] 12.90323 12.90323 13.33333 12.90323 13.33333 12.90323 12.90323 13.79310
[4,] 25.80645 25.80645 26.66667 25.80645 26.66667 25.80645 25.80645 27.58621
> rentalIncome
     Jul      Aug      Sep      Oct      Nov      Dec      Jan      Feb 
1319.355 2300.000 2300.000 1800.000 1200.000 1200.000  400.000    0.000 
>

thanks for the solution, sorry for the super delayed, I haven't been in the office. The solution works well, just one question: it doesn't take years into account (I realize my example didn't illustrate the need for this). Currently, if there is a start date, say, 20-06-2014, and a corresponding end date of 30-07-2015, the rental amount is pro-rata'ed for June 2015 as well as Jun 2014. Is there any way around this? Thanks, appreciate the help! Mike — Mike, Jul 28 '15 at 17:29
The months, to which the columns of "RentPerDay" correspond, are not just January, February,...,December, but July 2015, August 2015,...,February 2016. If there is another lease, starting on 2014-6-20 and ending on 2015-7-31, the months are June 2014, July 2014,...,February 2016. Think of a spiral instead of a circle. Perhaps the explanation to my solution was ambiguous at this point. In the example to my secomd solution the additional lease contributes 110 to June 2014 and 300 to June 2015. — mra68, Jul 29 '15 at 02:16

R: Aggregating between dates without for loop

4 Answers4