-1

This is similar to this question and this one but I just can't seem to work out how to adapt it to my situation. I have a 1437:60 dataframe with all numeric values. The first column is Depth and based on other data investigations I need to remove Depths (rows) that I have considered outliers.

For example:

Test <- data.frame(Depth = seq(from = 0, to = 100, by = 0.5), X1 = runif(n = 201, min = 1, max = 10), X2 - runif(n = 201, min = 1, max = 10))

I would like to remove the rows where Depth is between 46.5 and 48.5 AND rows where Depth is between 65.5 and 68.5. I have tried creating a vector and filtering based on that, e.g.

OutDepth <- c(seq(from = 46.5, to = 48.5, by = 0.5), seq(from = 65.5, to =  68.5, by = 0.5)

Test1 <- Test %>% filter(Depth == !OutDepths)

which gives an error of

longer object length is not a multiple of shorter object length

I get the same error if I try

Test1 <- Test[Test$Depths == !OutDepths, ]

Thanks in advance for any advice

SOLUTION It turns out I simply had the location of the not (!) operator in the wrong spot and I should have been using %in% instead of ==.

Eg.

Test1 <- Test %>%
filter(!Depth %in% OutDepths)

or base r

Test1 <- Test[!Test$Depth %in% OutDepths, ]
JJGabe
  • 383
  • 1
  • 2
  • 10

2 Answers2

4

Here is another alternative from between function.

library(dplyr)
df <- data.frame(depth = c(20,40,47,50,60,67,80,90,100,120))

df %>% 
    filter(!between(depth, 46.5, 48.5)) %>% 
    filter(!between(depth, 65.5, 68.5))



#   depth
#1    20
#2    40
#3    50
#4    60
#5    80
#6    90
#7   100
#8   120
Sri Sreshtan
  • 535
  • 3
  • 12
  • Thanks. This answer works and is slightly less code than @Gorka's solution. However, I am moving into dataframes with >100k rows and so I was looking for a more automated solution if possible. But this is definitely a reasonable start! – JJGabe Jun 28 '20 at 16:02
3

Try this:

Test %>% 
  filter(Depth < 46.5 | Depth > 48.5) %>%
  filter(Depth < 65.5 | Depth > 68.5)
Gorka
  • 1,971
  • 1
  • 13
  • 28