I have the following data table:
dt <- data.table(date=c(seq.Date(as.Date("2000-01-01"),as.Date("2000-01-03"),"1 day")),
a=c(1,2,3),
b=c(1,2,3),
c=c(1,2,3))
> dt
date a b c
1: 2000-01-01 1 1 1
2: 2000-01-02 2 2 2
3: 2000-01-03 3 3 3
and I need to replicate it n
times (code taken from Repeat data.frame N times):
n <- 3
dt.rep <- dt[rep(seq_len(nrow(dt)), n)]
> dt.rep
date a b c
1: 2000-01-01 1 1 1
2: 2000-01-02 2 2 2
3: 2000-01-03 3 3 3
4: 2000-01-01 1 1 1
5: 2000-01-02 2 2 2
6: 2000-01-03 3 3 3
7: 2000-01-01 1 1 1
8: 2000-01-02 2 2 2
9: 2000-01-03 3 3 3
However, I need the date
column to be sequential. The last row would be the actual last date, and I need it to go backwards to the first row, so the expected output is:
date a b c
1: 1999-12-26 1 1 1
2: 1999-12-27 2 2 2
3: 1999-12-28 3 3 3
4: 1999-12-29 1 1 1
5: 1999-12-30 2 2 2
6: 1999-12-31 3 3 3
7: 2000-01-01 1 1 1
8: 2000-01-02 2 2 2
9: 2000-01-03 3 3 3
How to achieve this?
EDIT:
The proposed solution seems to fail for large, hourly datasets. Consider this new example:
dt <- data.table(date=seq(as.POSIXct("1994-01-01 00:00:00"), as.POSIXct("2008-12-31 23:00:00"), by="1 hour"), temp=runif(n=131496, min=10, max=35))
> dt
date temp
1: 1994-01-01 00:00:00 26.40286
2: 1994-01-01 01:00:00 21.37171
3: 1994-01-01 02:00:00 16.11227
4: 1994-01-01 03:00:00 30.28062
5: 1994-01-01 04:00:00 25.22336
---
131492: 2008-12-31 19:00:00 18.43148
131493: 2008-12-31 20:00:00 24.10905
131494: 2008-12-31 21:00:00 10.33235
131495: 2008-12-31 22:00:00 27.73049
131496: 2008-12-31 23:00:00 21.74835
When replicating it 5
times, this is what we have:
n <- 5
dt[rep(seq_len(.N), n)][, newdate:=rev(seq(last(date),
length.out=.N, by='-1 hour'))][]
date temp newdate
1: 1994-01-01 00:00:00 26.40286 1933-12-31 00:00:00
2: 1994-01-01 01:00:00 21.37171 1933-12-31 01:00:00
3: 1994-01-01 02:00:00 16.11227 1933-12-31 02:00:00
4: 1994-01-01 03:00:00 30.28062 1933-12-31 03:00:00
5: 1994-01-01 04:00:00 25.22336 1933-12-31 04:00:00
---
657476: 2008-12-31 19:00:00 18.43148 2008-12-31 19:00:00
657477: 2008-12-31 20:00:00 24.10905 2008-12-31 20:00:00
657478: 2008-12-31 21:00:00 10.33235 2008-12-31 21:00:00
657479: 2008-12-31 22:00:00 27.73049 2008-12-31 22:00:00
657480: 2008-12-31 23:00:00 21.74835 2008-12-31 23:00:00
Notice how the date
and newdate
columns are out of sync.
I would expect newdate
to start on 1934-01-01 00:00:00
, but rather it starts on 1933-12-31 00:00:00
. This results in the data table having 76 (length(unique(year(dt$newdate)))
) years of data, rather than 5
replications of the 15 years
in the original one = 75
years. I am not sure what is going on here...