0

I would like to shrink down the duplicate data by keeping the row order.

Input

x 1 2 3 3 2 2 3 1 1

to

x 1 2 3 2 3 1

what would be the suitable function for such operation?

Thank you

metinu
  • 1
  • 1
    `rle(x)$values`. – user2974951 Mar 11 '22 at 13:26
  • Does this answer your question? [Remove/collapse consecutive duplicate values in sequence](https://stackoverflow.com/questions/27482712/remove-collapse-consecutive-duplicate-values-in-sequence) – Maël Mar 11 '22 at 13:38

1 Answers1

0

one option is to calculate the vector of diferences, increment a number != 0 at the beginning of this vector (beause the first value can not generate any diference as it has no precessor) and use this to filter x for values where the diference is != 0:

x <- c( 1, 2, 3, 3, 2, 2, 3, 1, 1)

x[c(1,diff(x)) != 0]
[1] 1 2 3 2 3 1

since you mentioned data.frame here is one way to solve this within the tidyverse, given that x is the colum of the df. We can use the lag() function to call the preceding row value:

library(dplyr)

data.frame(x) %>% 
    # calculate diferences between rows of X and filter for those where the diference != 0 or NA (first row)
    dplyr::filter(lag(x)-x != 0 | is.na(lag(x)))

  x
1 1
2 2
3 3
4 2
5 3
6 1
DPH
  • 4,244
  • 1
  • 8
  • 18