I am having some trouble cleaning data that I imported from Excel with readxl. I am able to read each excel file and store individual sheets contained in that file into separate data frames.
The problem is that each of these created data frames have many columns entirely filled with NAs which is because my code is storing these sheets into a list from which I am creating separate data frames and hence these NA columns are coming as a result of union of all columns contained in that Excel file.
I wish to automate this process of removing all NA columns from all the data frames having different number of rows and columns using a for loop. However when I try this:
for(i in 1:length(AllFileSheetnames)){
assign(AllFileSheetnames[i], function(x) x[, colSums(is.na(x)) < nrow(x)])
print(AllFileSheetnames[i])
}
puts all the values in a list again with a union of all the columns.
I have imported 5 sheets from an excel file in R. All the sheets are stored into data frames D1, D2, D3, D4, and D5. The details of the original sheets are as follows:
D1: 99 Rows * 150 Columns;
D2: 99 Rows * 166 Columns;
D3: 99 Rows * 77 Columns;
D4: 99 Rows * 8 Columns;
D5: 99 Rows * 7 Columns
When I import this file using readxl, it creates a list which contains 495 rows and 247 odd columns. I am successfully able to split the list into separate data frames for D1, D2 and so on, but each data frame has 247 columns and now I wish to automate the process of removing NA columns from each data frame to get the above mentioned dimensions for respective data frames.