I have a data.table with 3 columns that I want to split the 3rd by a delimiter to multiple rows.
My current implementation is:
protein.ids <- c("PA0001","PA0001", "PA0002", "PA0002", "PA0002")
protein.names <- c("protein A", "protein A", "protein B", "protein B", "protein B")
peptides.ids <- c("1;3;2", "81;23;72", "7;6;8", "10;35;21", "5;2;7")
data <- data.frame(matrix(c(protein.ids, protein.names, peptides.ids),
nrow = 5),
stringsAsFactors = FALSE)
colnames(data) <- c("Protein IDs", "Protein Names", "Peptide IDs")
data <- data.table(data)
data[ ,list(`Peptide IDs` = unlist(strsplit(`Peptide IDs`, ";"))),
by = list(`Protein IDs`, `Protein Names`)]
However my data.table is quite big (~1.2G) and till now it gets ~3 seconds to run, so is there a faster approach to achieve the same results or there isn't any juice worth to squeeze?