I'm generating a data.table
by splitting strings of character from one column and turning them into rows. Despite that, I already have the desired outcome, it's a bit slow, and I have a large dataset. Is there a more efficient way?
Here is my example:
edlist <- list()
for(i in seq_along(cvu_wos_cl3$UT)){
t <- cvu_wos_cl3[,.(UT, AU2, NAU, PY, C1, RP, CC1, NC1)][i]
a <- unlist(strsplit(t[,AU2], ";"))
o <- seq_along(a)
edlist[[i]] <- data.table(AU=a, OR=o, t[, .(UT, PY, C1, RP, CC1, NAU, NC1) ])
}
edlist1 <- rbindlist(edlist)
The original data.table is:
> cvu_wos_cl3[,.(UT, AU2, NAU, PY, C1, RP, CC1, NC1)][1,1:3]
UT AU2 NAU
1: WOS:000070949000010 120472; 998; Soberon, X 3
From here I'm storing each row into t
, and taking the column AU2, a string of character, split by ";"
.
t[,AU2]
[1] "120472; 998; Soberon, X"
unlist(strsplit(t[,AU2], ";"))
[1] "120472" " 998" " Soberon, X"
Latter, creating a new data.table
, with 3 rows which was originally one row:
data.table(AU=a, OR=o, t[, .(UT, PY, C1, RP, CC1, NAU, NC1) ])[,1:3]
AU OR UT
1: 120472 1 WOS:000070949000010
2: 998 2 WOS:000070949000010
3: Soberon, X 3 WOS:000070949000010
Any suggestions will be welcome.