Need an R function for choosing specific named columns from a data frame

Question

I am relatively new to R. I have a dataset that I have imported into R via package xlsx and filtered by a "randomAssignment" column. However, in the newly create data frames (such as ABCD, CDEF, etc.), there are columns with empty rows; I want to remove these columns. What is the best / quickest approach for this?

require(xlsx)
require(tidyr)
require (dplyr)
require(tidyverse)

#IMPORT XLSX DATA INTO R USING XLSX PACKAGE
originalData <- read.xlsx("C:/Users/help/Desktop/GetTestedMessageTesting_FinalRawData_12292018.xlsx", 1, header = TRUE, colIndex = NULL, as.data.frame = TRUE)

ABCD <- filter (originalData, randomAssignment == "ABCD")
EFGH <- filter (originalData, randomAssignment == "EFGH")
IJKL <- filter (originalData, randomAssignment == "IJKL")
MNOP <- filter (originalData, randomAssignment == "MNOP")
QRST <- filter (originalData, randomAssignment == "QRST")
UVWX <- filter (originalData, randomAssignment == "UVWX")
CDEF <- filter (originalData, randomAssignment == "CDEF")
YZAB <- filter (originalData, randomAssignment == "YZAB")

so you want to remove the entire column which contains missing values? or you want to remove the row which contain the missing values. — Naveen, Jan 12 '19 at 20:58
Welcome to stackoverflow. I'm also not sure what you re trying to do. Here is a really good guide on how to write a great question: https://stackoverflow.com/a/5963610/5028841 — JBGruber, Jan 12 '19 at 21:01
Possible duplicate of [R: Remove multiple empty columns of character variables](https://stackoverflow.com/questions/17672649/r-remove-multiple-empty-columns-of-character-variables) — divibisan, Jan 15 '19 at 20:50

score 0 · Accepted Answer · answered Jan 12 '19 at 21:00

I interpreted your question to remove the columns that have all missing / NA values. Here's one solution - you may need to modify the anonymous function if your data aren't actually NA.

The gist of the function is that we are creating a boolean (TRUE/FALSE) for each column of my_mtcars that corresponds to whether ALL of the entries are NA and we negate that to return that column.

#create copy of mtcars
my_mtcars <- mtcars
#set hp to NA
my_mtcars$hp <- NA
#filter out columns that are all NA
head(my_mtcars[, sapply(my_mtcars, function(x) !all(is.na(x)))])
#>                    mpg cyl disp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout 18.7   8  360 3.15 3.440 17.02  0  0    3    2
#> Valiant           18.1   6  225 2.76 3.460 20.22  1  0    3    1

^{Created on 2019-01-12 by the reprex package (v0.2.1)}

Thank you very much! That resolved my issue. I understand the sapply function but, so that I can learn, what does "my_mtcars$hp <- NA" do? — rkverma1974, Jan 13 '19 at 00:26
@rkverma1974 - just tried to make a dataset that mimics what you said you had with yours, ie a columns full of NA values. — Chase, Jan 13 '19 at 02:51

Need an R function for choosing specific named columns from a data frame

1 Answers1