3

I am trying to filter out NA, NaN and Inf values out of a tbl using dyplr's filter function.

The trick is that I only want to apply the filter to columns whose names contain a specific pattern. The pattern is: r1, r2, r3, etc.

I have tried to combine grep and filter to achieve this, but can't get it to work. My current code looks like this:

filter_(!is.na(grep("r[1-9]", colnames(DF), value = TRUE)) 
& !is.infinite(grep("r[1-9]", colnames(DF), value = TRUE)) 
& !is.nan(grep("r[1-9]", colnames(DF), value = TRUE)))

However, this code returns a warning message: "Truncating vector to length 1." And the data returned is unfiltered.

I suspect that it's the is.na functions here that are causing the problem, because I've seen an example online where you can apply grep to filter using a normal condition (i.e. condition == value) and not a condition based on is.na

oguz ismail
  • 1
  • 16
  • 47
  • 69
Anita McGill
  • 51
  • 1
  • 4
  • 1
    Can you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of your dataset? – Z.Lin Sep 04 '17 at 16:11
  • I believe that the problem is that you are testing `colnames(DF)` for `NA`, `NaN` and `Inf`, when it should be the values in those columns. (The ones that match your pattern `r[1-9]`.) – Rui Barradas Sep 04 '17 at 16:15

3 Answers3

7

dplyr provides matches() that is useful for this

Example 1: How matches() work?

library(dplyr)

# remove columns that start with "mp"
mtcars %>% select(-matches("mp"))

# keep columns that start with "mp"
mtcars %>% select(matches("mp"))

Example 2: Using matches() in the context of your request but using a MWE

# Create a dummy dataset
data = tibble(id = c("John","Paul","George","Ringo"),
              r1 = c(1,2,NA,NA), 
              r2 = c(1,2,NA,4),
              s1 = c(1,NA,3,4))

# Filter NAs in columns that start with r followed by a number
data %>% filter_at(vars(matches("r[0-9]")), all_vars(!is.na(.)))
pachadotdev
  • 3,345
  • 6
  • 33
  • 60
4

Here is a base R method to filter rows, comparing specific columns.

# sample data
set.seed(1234)
dat <- data.frame(r1=c(NA, 1,NaN, 5, Inf), r2=c(NA, 1,NaN, NA, Inf), d=rnorm(5))

this data set looks like

dat
   r1  r2          d
1  NA  NA -1.2070657
2   1   1  0.2774292
3 NaN NaN  1.0844412
4   5  NA -2.3456977
5 Inf Inf  0.4291247

We will check the first two columns and ignore the third column. Notice that the only row that should remain is row 2.

dat[Reduce("&", lapply(dat[grep("^r", names(dat))], is.finite)),]
  r1 r2         d
2  1  1 0.2774292

Here, a data.frame that is subset using grep to select the appropriate columns (1 and 2) is fed to lapply. The regex "^r" says only include variables whose names that start with "r". In the lapply loop, each vector is checked using is.finite. This function returns FALSE for NA, NaN, and Inf. The resulting list of logical vectors is fed to Reduce` which returns a logical vector the length of the number of rows of the data.frame where an element is TRUE if and only if every element in a row is finite.

lmo
  • 37,904
  • 9
  • 56
  • 69
4

With dplyr, you can use the filter_at function:

dat %>% filter_at(vars(matches("^r[1-9]")), all_vars(is.finite(.)))

Using @lmo's sample data, the result is:

  r1 r2         d
1  1  1 0.2774292
eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Hi, this works. But I've now discovered a further issue which can be resolved by only filtering the `Inf` and `NaN` values (and leaving in `NA`). To do this can I replace the second argument with `all_vars(!is.infinite(.)) & all_vars(!is.nan(.))` – Anita McGill Sep 05 '17 at 06:21
  • @AnitaMcGill above I posted a solution that points to your current problem – pachadotdev Sep 11 '17 at 16:13