FYI: I have a solution to this problem but I wanted to post the solution for anyone with a similar problem.
I have a data set in long format
dat = data.frame(id = rep(1:2, each=10),
x1 = runif(20),
x2 = rnorm(20),
x3 = letters[1:20]
)
with a column identifying the order of rows within each id
dat$id_row = ave(as.character(dat$id), as.character(dat$id), FUN = seq_along)
and some values only occur once per id
dat$x3[dat$id_row>1] = NA
So dat
looks like this:
id x1 x2 x3 id_row
1 1 0.54788708 -0.3870533 a 1
2 1 0.41625779 0.1528117 <NA> 2
3 1 0.26278100 -0.9239091 <NA> 3
4 1 0.69968279 1.2025596 <NA> 4
5 1 0.26647728 0.3301025 <NA> 5
6 1 0.72934348 0.3218639 <NA> 6
7 1 0.43240532 -0.3136323 <NA> 7
8 1 0.98646191 -2.0612792 <NA> 8
9 1 0.69606418 1.7286719 <NA> 9
10 1 0.54025279 1.4872084 <NA> 10
11 2 0.33639210 -0.1692205 k 1
12 2 0.05372062 0.9645393 <NA> 2
13 2 0.46421037 -1.1011200 <NA> 3
14 2 0.66753436 -1.5613799 <NA> 4
15 2 0.72302407 -0.9173485 <NA> 5
16 2 0.26640179 0.7012415 <NA> 6
17 2 0.37752229 -0.1136814 <NA> 7
18 2 0.45110511 -0.2051627 <NA> 8
19 2 0.47969921 1.4154800 <NA> 9
20 2 0.23142331 -0.6178061 <NA> 10
I want to convert it to wide form using data.table::dcast
like this
library(data.table)
setDT(dat, key="id")
cols = colnames(dat)
cols = cols[!cols %in% c("id", "id_row")] #all the columns other than these
dat_wide = dcast(dat, id ~ id_row,
value.var = cols)
So dat_wide
looks like this:
id x1_1 x1_10 x1_2 x1_3 x1_4 x1_5 x1_6 x1_7 x1_8 x1_9
1: 1 0.5478871 0.5402528 0.41625779 0.2627810 0.6996828 0.2664773 0.7293435 0.4324053 0.9864619 0.6960642
2: 2 0.3363921 0.2314233 0.05372062 0.4642104 0.6675344 0.7230241 0.2664018 0.3775223 0.4511051 0.4796992
x2_1 x2_10 x2_2 x2_3 x2_4 x2_5 x2_6 x2_7 x2_8 x2_9
1: -0.3870533 1.4872084 0.1528117 -0.9239091 1.20256 0.3301025 0.3218639 -0.3136323 -2.0612792 1.728672
2: -0.1692205 -0.6178061 0.9645393 -1.1011200 -1.56138 -0.9173485 0.7012415 -0.1136814 -0.2051627 1.415480
x3_1 x3_10 x3_2 x3_3 x3_4 x3_5 x3_6 x3_7 x3_8 x3_9
1: a NA NA NA NA NA NA NA NA NA
2: k NA NA NA NA NA NA NA NA NA
I would like the following:
- Leading zeros for the column names (eg
x1_01
instead ofx1_1
) - Drop all of the empty columns
- For the columns that have only one observation, remove the
_1
thatdcast
added.