Dropping missing values from vectors
The errors indicate that your data are likely a vector
, not a data.frame
. Accordingly, there are no rows or columns (it has no dim
) and so using [,]
is throwing errors. To support this, below I create a vector, reproduce the errors, and demonstrate how to drop missing values from it.
# Create vector, show it's a vector
vec <- c(NA,1:4)
vec
#> [1] NA 1 2 3 4
is.vector(vec)
#> [1] TRUE
# Reproduces your errors for both methods
is.na(vec[ ,2:3])
#> Error in vec[, 2:3]: incorrect number of dimensions
vec[complete.cases(vec[ , 2:3]), ]
#> Error in vec[, 2:3]: incorrect number of dimensions
# Remove missing values from the vector
vec[!is.na(vec)]
#> [1] 1 2 3 4
vec[complete.cases(vec)]
#> [1] 1 2 3 4
I'll additionally show you below how to check if your data object is a data.frame
and how to omit rows with missing values in case it is.
Create data and check it's a data.frame
# Create an example data.frame
set.seed(123)
N <- 10
df <- data.frame(
x1 = sample(c(NA_real_, 1, 2, 3), N, replace = T),
x2 = sample(c(NA_real_, 1, 2, 3), N, replace = T),
x3 = sample(c(NA_real_, 1, 2, 3), N, replace = T)
)
print(df)
#> x1 x2 x3
#> 1 2 3 NA
#> 2 2 1 3
#> 3 2 1 NA
#> 4 1 NA NA
#> 5 2 1 NA
#> 6 1 2 2
#> 7 1 3 3
#> 8 1 NA 1
#> 9 2 2 2
#> 10 NA 2 1
# My hunch is that you are not using a data.frame. You can check as follows:
class(df)
#> [1] "data.frame"
Approaches to removing rows with missing values from data.frames
Your first approach returns logical values for whether a value is missing for the specified columns. You could then rowSum
and drop them per below.
# Example: shows whether values are missing for second and third columns
miss <- is.na(df[ ,2:3])
print(miss)
#> x2 x3
#> [1,] FALSE TRUE
#> [2,] FALSE FALSE
#> [3,] FALSE TRUE
#> [4,] TRUE TRUE
#> [5,] FALSE TRUE
#> [6,] FALSE FALSE
#> [7,] FALSE FALSE
#> [8,] TRUE FALSE
#> [9,] FALSE FALSE
#> [10,] FALSE FALSE
# We can sum all of these values by row (`TRUE` = 1, `FALSE` = 0 in R) and keep only
# those rows that sum to 0 to remove missing values. Notice that the row names
# retain the original numbering.
df[rowSums(miss) == 0, ]
#> x1 x2 x3
#> 2 2 1 3
#> 6 1 2 2
#> 7 1 3 3
#> 9 2 2 2
#> 10 NA 2 1
Your second approach is to use complete.cases
. This also works and produces the same result as the first approach.
miss_cases <- df[complete.cases(df[ ,2:3]), ]
miss_cases
#> x1 x2 x3
#> 2 2 1 3
#> 6 1 2 2
#> 7 1 3 3
#> 9 2 2 2
#> 10 NA 2 1
A third approach is to use na.omit()
however, it doesn't let you specify columns and you should just use complete.cases
instead if you need to filter on specific columns.
na.omit(df)
#> x1 x2 x3
#> 2 2 1 3
#> 6 1 2 2
#> 7 1 3 3
#> 9 2 2 2
A fourth approach is to use the tidyr
package where the appeal is you can use column indices as well as unquoted column names. This also updates row names.
library(tidyr)
drop_na(df, 2:3)
#> x1 x2 x3
#> 1 2 1 3
#> 2 1 2 2
#> 3 1 3 3
#> 4 2 2 2
#> 5 NA 2 1