0

I want to filter a data frame (c1p.bim.keep) based on numerical values in one column (V3), so that the values are ascending and where any values that are out of order, the row is removed.

I've tried these commands:

c1p.bim.keep <- c1p.bim.keep %>%
  filter(V3>lag(V3))

c1p.bim.keep <- c1p.bim.keep %>%
  filter(V3>lag(V3) | V3<lead(V3))

c1p.bim.keep <- c1p.bim.keep %>%
  mutate(prev=lag(V3)) %>%
  filter(V3>prev | V3<lead(prev))

The problem I'm facing is that I have multiple rows together that need removing, but these commands don't seem to update as one row is removed. I've tried putting in a loop, but this causes so many of the first values to be removed too.

So how can I ensure that all unordered rows are removed without losing so many of the first rows?

I_O
  • 4,983
  • 2
  • 2
  • 15
Sarah
  • 1
  • 2
    Hi Sarah! Welcome to StackOverflow. Please provide [a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Mark Jul 05 '23 at 09:27

1 Answers1

0

what about:

## create sample data
set.seed(123)
d <- data.frame(V3 = c(3, 1, 1, 4), V4 = rnorm(4))
> d
  V3          V4
1  3 -0.56047565
2  1 -0.23017749
3  1  1.55870831
4  4  0.07050839
  • calculate run length encoding (length of uniform-valued sequences):
the_rle  <-  rle(d$V3)
> the_rle
Run Length Encoding
  lengths: int [1:3] 1 2 1
  values : num [1:3] 3 1 4
  • locate sequences whose uniform value is lower than the preceding sequence's one and filter out such values:
library(dplyr)

d |> filter(!(V3 %in% the_rle$values[the_rle$values < the_rle$values[-1]]))
  V3          V4
1  3 -0.56047565
2  4  0.07050839

(Note that with large data, it might become necessary [faster] to filter out unwanted positions rather than unwanted values.)

I_O
  • 4,983
  • 2
  • 2
  • 15
  • This seems to be working, but after a while it gives a warning, any ideas on this? There was 1 warning in `filter()`. ℹ In argument: `!(V3 %in% c1p_rle$values[c1p_rle$values < c1p_rle$values[-1]])`. Caused by warning in `c1p_rle$values < c1p_rle$values[-1]`: ! longer object length is not a multiple of shorter object length – Sarah Jul 06 '23 at 08:01
  • A vector `xs` is one longer than `xs[-1]`, hence the warning. It's a hack to compare each element of `xs` with its predecessor, so you'll get an `NA` for the first element of xs. Which should be fine for the purpose at hand though. – I_O Jul 06 '23 at 14:51