I have a data frame with alternating columns that I want to reshape. The problem is that stats::reshape
and reshape2::reshape
are both very slow and memory intensive on my actual use case. I suspect that the no-copy approach of data.table
will save me time and use less resources, but I barely know where to start with the syntax (previous related efforts 1, 2).
Here's an example of how my data frame is structured:
set.seed(4)
dt <- data.frame(names = letters[1:10],
one = rep(23,10),
two = sample(1000,10),
three = sample(10,10),
onea = rep(24,10),
twoa = sample(1000,10),
threea = sample(10,10),
oneb = rep(25,10),
twob = sample(1000,10),
threeb = sample(10,10),
onec = rep(26,10),
twoc = sample(1000,10),
threec = sample(10,10),
oned = rep(26,10),
twod = sample(1000,10),
threed = sample(10,10))
Which looks like this:
names one two three onea twoa threea oneb twob threeb onec twoc threec oned
1 a 23 586 8 24 715 6 25 939 4 26 561 4 26
2 b 23 9 3 24 996 3 25 242 7 26 72 6 26
3 c 23 294 1 24 506 8 25 565 8 26 852 10 26
4 d 23 277 7 24 489 5 25 181 6 26 911 3 26
5 e 23 811 9 24 647 9 25 901 5 26 225 5 26
6 f 23 260 6 24 827 7 25 84 3 26 626 8 26
7 g 23 721 4 24 480 2 25 896 1 26 69 2 26
8 h 23 900 2 24 836 4 25 886 10 26 512 9 26
9 i 23 942 5 24 510 1 25 718 2 26 799 1 26
10 j 23 73 10 24 526 10 25 560 9 26 964 7 26
twod threed
1 911 2
2 709 10
3 571 5
4 915 9
5 899 3
6 59 1
7 46 4
8 982 7
9 205 8
10 921 6
Here's what I'm currently doing with stats::reshape
which takes a long time and uses a lot of memory on my actual use case:
df_l <- stats::reshape(dt, idvar='names',
varying=list(ones = colnames(dt[seq(from=2,
to=ncol(dt), by=3)]),
twos = colnames(dt[seq(from=4,
to=ncol(dt), by=3)])),
direction="long")
Here's the desired output (I don't care about any of the three
columns):
df_l
names two twoa twob twoc twod time one three
a.1 a 586 715 939 561 911 1 23 8
b.1 b 9 996 242 72 709 1 23 3
c.1 c 294 506 565 852 571 1 23 1
d.1 d 277 489 181 911 915 1 23 7
e.1 e 811 647 901 225 899 1 23 9
f.1 f 260 827 84 626 59 1 23 6
g.1 g 721 480 896 69 46 1 23 4
h.1 h 900 836 886 512 982 1 23 2
i.1 i 942 510 718 799 205 1 23 5
j.1 j 73 526 560 964 921 1 23 10
a.2 a 586 715 939 561 911 2 24 6
b.2 b 9 996 242 72 709 2 24 3
c.2 c 294 506 565 852 571 2 24 8
d.2 d 277 489 181 911 915 2 24 5
e.2 e 811 647 901 225 899 2 24 9
f.2 f 260 827 84 626 59 2 24 7
g.2 g 721 480 896 69 46 2 24 2
h.2 h 900 836 886 512 982 2 24 4
i.2 i 942 510 718 799 205 2 24 1
j.2 j 73 526 560 964 921 2 24 10
a.3 a 586 715 939 561 911 3 25 4
b.3 b 9 996 242 72 709 3 25 7
c.3 c 294 506 565 852 571 3 25 8
d.3 d 277 489 181 911 915 3 25 6
e.3 e 811 647 901 225 899 3 25 5
f.3 f 260 827 84 626 59 3 25 3
g.3 g 721 480 896 69 46 3 25 1
h.3 h 900 836 886 512 982 3 25 10
i.3 i 942 510 718 799 205 3 25 2
j.3 j 73 526 560 964 921 3 25 9
a.4 a 586 715 939 561 911 4 26 4
b.4 b 9 996 242 72 709 4 26 6
c.4 c 294 506 565 852 571 4 26 10
d.4 d 277 489 181 911 915 4 26 3
e.4 e 811 647 901 225 899 4 26 5
f.4 f 260 827 84 626 59 4 26 8
g.4 g 721 480 896 69 46 4 26 2
h.4 h 900 836 886 512 982 4 26 9
i.4 i 942 510 718 799 205 4 26 1
j.4 j 73 526 560 964 921 4 26 7
a.5 a 586 715 939 561 911 5 26 2
b.5 b 9 996 242 72 709 5 26 10
c.5 c 294 506 565 852 571 5 26 5
d.5 d 277 489 181 911 915 5 26 9
e.5 e 811 647 901 225 899 5 26 3
f.5 f 260 827 84 626 59 5 26 1
g.5 g 721 480 896 69 46 5 26 4
h.5 h 900 836 886 512 982 5 26 7
i.5 i 942 510 718 799 205 5 26 8
j.5 j 73 526 560 964 921 5 26 6
How can I do this with data.table
?