1

I have the following data table:

dt <- data.table(date=c(seq.Date(as.Date("2000-01-01"),as.Date("2000-01-03"),"1 day")),
                 a=c(1,2,3),
                 b=c(1,2,3),
                 c=c(1,2,3))
> dt
         date a b c
1: 2000-01-01 1 1 1
2: 2000-01-02 2 2 2
3: 2000-01-03 3 3 3

and I need to replicate it n times (code taken from Repeat data.frame N times):

n <- 3
dt.rep <- dt[rep(seq_len(nrow(dt)), n)]

> dt.rep
         date a b c
1: 2000-01-01 1 1 1
2: 2000-01-02 2 2 2
3: 2000-01-03 3 3 3
4: 2000-01-01 1 1 1
5: 2000-01-02 2 2 2
6: 2000-01-03 3 3 3
7: 2000-01-01 1 1 1
8: 2000-01-02 2 2 2
9: 2000-01-03 3 3 3

However, I need the date column to be sequential. The last row would be the actual last date, and I need it to go backwards to the first row, so the expected output is:

         date a b c
1: 1999-12-26 1 1 1
2: 1999-12-27 2 2 2
3: 1999-12-28 3 3 3
4: 1999-12-29 1 1 1
5: 1999-12-30 2 2 2
6: 1999-12-31 3 3 3
7: 2000-01-01 1 1 1
8: 2000-01-02 2 2 2
9: 2000-01-03 3 3 3

How to achieve this?

EDIT:

The proposed solution seems to fail for large, hourly datasets. Consider this new example:

dt <- data.table(date=seq(as.POSIXct("1994-01-01 00:00:00"), as.POSIXct("2008-12-31 23:00:00"), by="1 hour"), temp=runif(n=131496, min=10, max=35)) 
> dt
                       date     temp
     1: 1994-01-01 00:00:00 26.40286
     2: 1994-01-01 01:00:00 21.37171
     3: 1994-01-01 02:00:00 16.11227
     4: 1994-01-01 03:00:00 30.28062
     5: 1994-01-01 04:00:00 25.22336
    ---                             
131492: 2008-12-31 19:00:00 18.43148
131493: 2008-12-31 20:00:00 24.10905
131494: 2008-12-31 21:00:00 10.33235
131495: 2008-12-31 22:00:00 27.73049
131496: 2008-12-31 23:00:00 21.74835

When replicating it 5 times, this is what we have:

n <- 5
dt[rep(seq_len(.N), n)][, newdate:=rev(seq(last(date),
                                           length.out=.N, by='-1 hour'))][]
                       date     temp             newdate
     1: 1994-01-01 00:00:00 26.40286 1933-12-31 00:00:00
     2: 1994-01-01 01:00:00 21.37171 1933-12-31 01:00:00
     3: 1994-01-01 02:00:00 16.11227 1933-12-31 02:00:00
     4: 1994-01-01 03:00:00 30.28062 1933-12-31 03:00:00
     5: 1994-01-01 04:00:00 25.22336 1933-12-31 04:00:00
    ---                                                 
657476: 2008-12-31 19:00:00 18.43148 2008-12-31 19:00:00
657477: 2008-12-31 20:00:00 24.10905 2008-12-31 20:00:00
657478: 2008-12-31 21:00:00 10.33235 2008-12-31 21:00:00
657479: 2008-12-31 22:00:00 27.73049 2008-12-31 22:00:00
657480: 2008-12-31 23:00:00 21.74835 2008-12-31 23:00:00

Notice how the date and newdate columns are out of sync. I would expect newdate to start on 1934-01-01 00:00:00, but rather it starts on 1933-12-31 00:00:00. This results in the data table having 76 (length(unique(year(dt$newdate)))) years of data, rather than 5 replications of the 15 years in the original one = 75 years. I am not sure what is going on here...

thiagoveloso
  • 2,537
  • 3
  • 28
  • 57

1 Answers1

1

After the replication step, using the last 'date', get the reverse sequence of 'date' by specifying the length.out as .N (number of rows and by as negative 1 day

dt[rep(seq_len(.N), n)][, date := rev(seq(last(date),
       length.out = .N, by = '-1 day'))][]
#         date a b c
#1: 1999-12-26 1 1 1
#2: 1999-12-27 2 2 2
#3: 1999-12-28 3 3 3
#4: 1999-12-29 1 1 1
#5: 1999-12-30 2 2 2
#6: 1999-12-31 3 3 3
#7: 2000-01-01 1 1 1
#8: 2000-01-02 2 2 2
#9: 2000-01-03 3 3 3

Update

Based on the OP's comments, it seems that the 'date' sequence should be reversed for each replication. In that case, we can use the replication as a grouping variable

n <- 5
dt[rep(seq_len(.N), n)][, newdate := rev(seq(last(date),
   length.out = .N, by='-1 hour')), by = .(rep(seq_len(n), each = nrow(dt)))][]
#                  date     temp             newdate
#1: 1994-01-01 00:00:00 34.19615 1994-01-01 00:00:00
#2: 1994-01-01 01:00:00 34.29310 1994-01-01 01:00:00
# ...

NOTE: Using the updated data in OP's post

akrun
  • 874,273
  • 37
  • 540
  • 662
  • I noticed that the code doesn't work properly with larger datasets. Please try your answer with this data table: `dt <- data.table(date=seq(as.POSIXct("1994-01-01 00:00:00"), as.POSIXct("2008-12-31 23:00:00"), by="1 hour"), temp=runif(n=131496, min=10, max=35)) `, replacing `-1 day` with `-1 hour` in your answer. – thiagoveloso Feb 03 '19 at 01:04
  • @thiagoveloso Can you tell me the issue you noticed with the hourly sequence – akrun Feb 03 '19 at 08:12
  • Please take a look at the edit I made in the original post. – thiagoveloso Feb 03 '19 at 14:44
  • @thiagoveloso I tested it with first with the same data and it works fine `dt[, newdate := rev(seq(last(date), length.out=.N, by='-1 hour'))]# head(dt)# date temp newdate 1: 1994-01-01 00:00:00 17.05952 1994-01-01 00:00:00`` Now, we look at how things are working with each replication. For e.g. `n1 <- 131496` i.e. the number of rows of original data. `> tail(seq(last(dt$date), length.out = n1 * 3, by = '-1 hour'), 1) [1] "1964-01-01 EST" > tail(seq(last(dt$date), length.out = n1 * 4, by = '-1 hour'), 1) [1] "1948-12-31 EST"` – akrun Feb 03 '19 at 21:21
  • For replication '5', it is `> tail(seq(last(dt$date), length.out = n1 * 5, by = '-1 hour'), 1) [1] "1933-12-31 EST"`` It means, we are taking the sequence from the last element to the first element i.e 131496 * 5 – akrun Feb 03 '19 at 21:22
  • @thiagoveloso If you really wanted to reflect the changes for each replication, then do a group by sequence i.e. `dt[rep(seq_len(.N), n)][, newdate := rev(seq(last(date),length.out=.N, by='-1 hour')), .(rep(seq_len(n), each = nrow(dt)))][]# date temp newdate 1: 1994-01-01 00:00:00 34.19615 1994-01-01 00:00:00 2: 1994-01-01 01:00:00 34.29310 1994-01-01 01:00:00` – akrun Feb 03 '19 at 21:25
  • The issue is that, like you demonstrated, the last element in each replication falls on different days (e.g. Jan 1 vs Dec 31). Shouldn't they fall on the same day, but different years? – thiagoveloso Feb 03 '19 at 22:16
  • @thiagoveloso Not sure about the logic you want to accomplish – akrun Feb 04 '19 at 06:20