Here is an approach that generates a list of data frames containing uniform random numbers, and processes it with lapply()
as proposed in the OP comments. Instead of using is.na()
to set TRUE vs FALSE, we use > 0.5
to create results data frames because data frames created as matrices of runif()
values won't have missing values.
Note that is.na()
can be used to set the entire output data frame to TRUE or FALSE values. No second pass of the data is required for !is.na()
.
Also note that the solution randomly assigns the number of columns in a data frame, so one can see that the solution does not require knowledge of the number of columns in each data frame.
Finally, to illustrate how to process a subset of the columns rather than the entire input data frame, we include logic to bind the first 4 columns of the original data with the columns of logicals.
set.seed(95014123)
dataList <- lapply(1:5,function(x) {
columnCount <- sample(6:10,1)
data.frame(matrix(runif(10*columnCount),nrow=10,ncol=columnCount))
})
# recode to binary based on whether values are > 0.5
resultList <- lapply(dataList,function(x) {
recodedCols <- as.data.frame(x[,5:ncol(x)] > .5)
colNames <- names(x[,5:ncol(x)])
names(recodedCols) <- colNames
cbind(x[,1:4],recodedCols)
})
# count sum of TRUEs across data tables
unlist(lapply(resultList,function(x){
sum(colSums(x[,5:ncol(x)]))
}))
...and the output:
> unlist(lapply(resultList,function(x){
+ sum(colSums(x[,5:ncol(x)]))
+ }))
[1] 27 20 22 27 17
>
UPDATE: Here is a solution that generates a random percentage of NA
values and uses is.na()
to create the result data frames.
set.seed(95014123)
dataList <- lapply(1:5,function(x) {
columnCount <- sample(6:10,1)
pctMissing <- sample(c(0.1,0.2,0.3,0.4,0.5),1)
dataValues <- runif(10*columnCount)
missingIds <- sample(1:(10*columnCount),
size=(pctMissing*10*columnCount))
dataValues[missingIds] <- NA
data.frame(matrix(dataValues,nrow=10,ncol=columnCount))
})
resultList <- lapply(dataList,function(x) {
recodedCols <- as.data.frame(is.na(x[,5:ncol(x)]))
colNames <- names(x[,5:ncol(x)])
names(recodedCols) <- colNames
cbind(x[,1:4],recodedCols)
})
# count sum of TRUEs across data tables
unlist(lapply(resultList,function(x){
sum(colSums(x[,5:ncol(x)]))
}))
...and the output:
> unlist(lapply(resultList,function(x){
+ sum(colSums(x[,5:ncol(x)]))
+ }))
[1] 23 16 9 1 17
>