0

I'm generating a data.table by splitting strings of character from one column and turning them into rows. Despite that, I already have the desired outcome, it's a bit slow, and I have a large dataset. Is there a more efficient way? Here is my example:

edlist <- list()
for(i in seq_along(cvu_wos_cl3$UT)){
  t <- cvu_wos_cl3[,.(UT, AU2, NAU, PY, C1, RP, CC1, NC1)][i]
  a <- unlist(strsplit(t[,AU2], ";"))
  o <- seq_along(a)
  edlist[[i]] <- data.table(AU=a, OR=o, t[, .(UT, PY, C1, RP, CC1, NAU, NC1) ])       
}
edlist1 <- rbindlist(edlist)

The original data.table is:

> cvu_wos_cl3[,.(UT, AU2, NAU, PY, C1, RP, CC1, NC1)][1,1:3]
                    UT                      AU2 NAU
1: WOS:000070949000010 120472; 998;  Soberon, X   3

From here I'm storing each row into t, and taking the column AU2, a string of character, split by ";".

t[,AU2]
[1] "120472; 998;  Soberon, X"
unlist(strsplit(t[,AU2], ";"))
[1] "120472"       " 998"         "  Soberon, X"

Latter, creating a new data.table, with 3 rows which was originally one row:

data.table(AU=a, OR=o, t[, .(UT, PY, C1, RP, CC1, NAU, NC1) ])[,1:3]
             AU OR                  UT
1:       120472  1 WOS:000070949000010
2:          998  2 WOS:000070949000010
3:   Soberon, X  3 WOS:000070949000010

Any suggestions will be welcome.

Mario GS
  • 859
  • 8
  • 22

0 Answers0