I have a relatively short script that takes a large dataframe (2373142 rows x 21 columns) of numeric and string fields and breaks it into a list of dfs based on values of one of the columns. The length of the list using this dataset ends up being 92 and is then run through a function from the Physical Activity package using lapply. The script works perfectly on smaller datasets but on one this large it maxes out the memory. I even tried breaking it up and running smaller and smaller lists, but it maxes out even with just a two item subset of the original list. I should add that my computer has 16GB of ram, all of which R has access to.
I'm at a loss of how to make it more efficient since I'm not using any loops, but I was hoping that someone more R savvy than I had some suggestions on efficiency. I'm worried that it's the wearingMarking package function that's causing the trouble, but I'm not sure. My data is sensitive, so unfortunately I can't provide a sample. My apologies as I know that is far from ideal and is restrictive, but any help would be greatly appreciated.
allData <- read.csv("myData.csv", header = TRUE) # Loading data
chngActivity <- allData[,c("activity")] #Creating a duplicate of activityIntensity column
chngActivity[chngActivity == -2] <- 0
allData <- cbind(allData, chngActivity)#Binding the new column to the old df
corTime <- transform(allData, dateTime=strptime(allData$dateTime, "%m/%d/%y %H:%M"))# Making sure the dateTime is set as a date
corTimeLst <- split(corTime, corTime$identifier) #Splitting into a list of dfs by identifier
rm(allData, corTime)
allChoi <- function(f) {
choi_test <- wearingMarking(dataset = f, #Running the choi
frame = 90, #The current parameters are set to
perMinuteCts = 1, # a one minute epoch with the new
TS = "dateTime", # non-wear column called "wearing"
cts = "chngActivity",
streamFrame = NULL,
allowanceFrame= 3,
newcolname = "wearing")
return(choi_test)
}
choiRun <- lapply(corTimeLst, allChoi)#applying the function to each participant on the list
choiFlat <- ldply(choiRun, data.frame)#Flattening the list into a df