I split a dataframe to create a dataframe list. The dataframe list has 401 dataframes. In other words, each dataframe is identical in structure (same columns), but potentially different numbers of rows.
When I split the dataframe, I introduced 0 variance columns (colSums=0). Dataframes in the list may share 0 variance columns, or they may have totally different columns with 0 variance.
I have used the following function (from Quickly remove zero variance variables from a data.frame) to remove 0 variance columns from each dataset:
zeroVar <- function(data, useNA = 'ifany') { out <- apply(data, 2,
function(x) {length(table(x, useNA = useNA))}) which(out==1) }
When I pass my data frame list to the function (ignoring the first two character columns of dataframe_list):
dataframe_list_zero_var_rm<-lapply(dataframe_list, function(d) d[,-zeroVar(d[,3:ncol(d)], useNA = 'no')])
No errors/flags are thrown.
However, while dataframes in dataframe_list_zero_var_rm have fewer columns than they do in dataframe_list, they still have columns that have zero variance, as revealed by:
zeroVar(dataframe_list_zero_var_rm[[1]][,3:ncol(dataframe_list_zero_var_rm)], useNA = 'no')
Passing the new dataframe to the original function shows me three columns with 0 variance which should have been removed in the first place.
This is a problem for me because I am trying to do principal components analysis on every dataframe in the list, but the zero variance columns become problematic for prcomp().
My ideal solution would be a way to
- loop through each element of the dataframe list and remove columns from each dataframe that have zero variance
- then, loop through each element of the dataframe list and perform prcomp() on the dataframe