0

I am having some trouble cleaning data that I imported from Excel with readxl. I am able to read each excel file and store individual sheets contained in that file into separate data frames.

The problem is that each of these created data frames have many columns entirely filled with NAs which is because my code is storing these sheets into a list from which I am creating separate data frames and hence these NA columns are coming as a result of union of all columns contained in that Excel file.

I wish to automate this process of removing all NA columns from all the data frames having different number of rows and columns using a for loop. However when I try this:

    for(i in 1:length(AllFileSheetnames)){
      assign(AllFileSheetnames[i], function(x) x[, colSums(is.na(x)) < nrow(x)])
      print(AllFileSheetnames[i])
    }

puts all the values in a list again with a union of all the columns.

I have imported 5 sheets from an excel file in R. All the sheets are stored into data frames D1, D2, D3, D4, and D5. The details of the original sheets are as follows:

D1: 99 Rows * 150 Columns;
D2: 99 Rows * 166 Columns;
D3: 99 Rows * 77 Columns;
D4: 99 Rows * 8 Columns;
D5: 99 Rows * 7 Columns

When I import this file using readxl, it creates a list which contains 495 rows and 247 odd columns. I am successfully able to split the list into separate data frames for D1, D2 and so on, but each data frame has 247 columns and now I wish to automate the process of removing NA columns from each data frame to get the above mentioned dimensions for respective data frames.

  • Welcome to SO. Please post minimal (!) reproducible examples and focus your question. Do you want to omit columns in a data frame that only contain missing values? That could be done with e.g. `df <- data.frame(x=1:10, y=NA, z=c(1:9, NA)); df[!sapply(df, function(x) all(is.na(x)))]`. – lukeA Nov 17 '15 at 10:58
  • Hi LukeA. I wish to omit all columns from multiple data frames and not a single dataframe. – smita.r.agrawal Nov 17 '15 at 11:00
  • Good to know, then please edit your question, cut all the fat and provide multiple example data frames in the veins of http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – lukeA Nov 17 '15 at 11:01
  • Sorry for the inconvenience caused. Hope the edited version gives more clarity on what I am exactly looking for. – smita.r.agrawal Nov 17 '15 at 11:43
  • No inconvenience caused. Providing a minimal & reproducible example is just the way to go when asking R questions. Look at my answer: it's easy to copy&paste and reproduce for you. In contrast, I still cannot reproduce anything from your question. – lukeA Nov 17 '15 at 12:06

1 Answers1

0

You might want to try this

## create two example data.frames (= reproducible example): 
 DF2 <- data.frame(x=1:3, y=c(1:2, NA), z=NA) 
(DF1 <- data.frame(x=1:3, y=NA, z=c(1:2, NA)) )
#   x  y  z
# 1 1 NA  1
# 2 2 NA  2
# 3 3 NA NA

## get objects named DFx and keep only cols, 
## where not *all* values are missing values (NA):
res <- lapply(mget(paste0("DF", 1:2)), 
       function(DF)
         DF[!sapply(DF, function(x) all(is.na(x)))]
)

## exploded the res-list into separate variables again:
invisible(list2env(res, globalenv())) # overwrites original DFs

## Inspect result: col y vanished in DF1
DF1
#   x  z
# 1 1  1
# 2 2  2
# 3 3 NA
lukeA
  • 53,097
  • 5
  • 97
  • 100