I have dataframe of observation of malwares. 1 of it's variable is type. Because the type variable includes combination of types (such as: adware++trojan). For some reason, I need to duplicate these observations according to the types while giving each duplicated observation with each disassembled type. For example, for 1 observation:
apksha time type market
8AB46C4A8AC 2013-09-23 16:04:24 adware++virus 1mobile
I want it to be like:
apksha time type market
8AB46C4A8AC 2013-09-23 16:04:24 adware 1mobile
8AB46C4A8AC 2013-09-23 16:04:24 virus 1mobile
I'm right now using the embedded for loop for this task:
newData <- data.frame()
combinedTypes <- grep("\\+", types, value=TRUE, perl=TRUE)
ctData <- rawData[rawData$type %in% combinedTypes, ]
for(i in 1:nrow(ctData)){
type <- ctData[i, ]$type
newTypes <- unlist(strsplit(type, "\\+\\+"))
for(t in newTypes){
nr <- ctData[i, ]
nr$type <- t
newData <- rbind(newData, nr)
}
}
rawData <- rawData[!(rawData$type %in% combinedTypes), ]
rawData <- rbind(rawData, newData)
problem is that it is very slow for R to run an embedded loop. So want to know if there any better solutions for this task?
Found a dirty and quick way:
splitedtype <- strsplit(rawData$type, "\\+\\+")
dataNew <- rawData[rep(seq_len(nrow(rawData)), lengths(splitedtype)), ]
dataNew$type <- unlist(splitedtype)