Create subset of data.frame based on complete observations in 7 variables?

Question

I have a data.frame with 571 observations of 156 variables. I am interested in keeping all 156 variables; however, I only need complete observations for 7 of these variables.

By using:

> nrow(na.omit(subset(finaldata, select = c("h_egfr_cystc96", "child_age96", "smoke_inside2T", "SES_3cat2T", "X_ZBFA96", "log2Tblood", "sexo_h00"))))

I learn that there are 453 observations that have complete information for these 7 variables.

How can I create a new data.frame that will have 453 observations of 156 variables, with complete information for my 7 variables of interest?

I suspect that complete.cases will be useful, but I am not sure how to apply it here.

Any ideas? Thank you in advance for the help!

Use `complete.cases` on just the affected columns, but its return values on the whole frame, as in `dat[complete.cases(dat[,c("col1","col2")]),]`. — r2evans, Jul 22 '20 at 17:33
Amazing, thank you to @r2evans! This works perfectly. What also works is using drop_na as follows: "experiment = drop_na(finaldata, c(re_PbM3T, re_PbM2T)" — goose144, Jul 22 '20 at 17:41

r2evans · Accepted Answer · 2020-07-22T17:36:29.990

Use complete.cases on just the columns of interest, but use its return value (a vector of logical) on the original frame.

mt <- mtcars[1:5,]
mt
#                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
# Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
mt$cyl[3] <- mt$disp[2] <- NA
mt[complete.cases(mt[,c("mpg","cyl")]),]
#                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag     21.0   6   NA 110 3.90 2.875 17.02  0  1    4    4
# Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2

Because I looked for complete cases in just "mpg" and "cyl", then the NA in "disp" didn't remove that row.

Create subset of data.frame based on complete observations in 7 variables?

1 Answers1