8

Is there a way to find the indices of change of factors in a column with R? For example:

x <- c("aaa", "aaa", "aaa", "bbb", "bbb", "ccc", "ddd")

would return 3, 5, 6

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Jonathan C Lee
  • 215
  • 1
  • 3
  • 11

4 Answers4

11

You could try to compare shifted vectors, e.g.

which(x[-1] != x[-length(x)])
## [1] 3 5 6

This will work both on characters and factors

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • @Corey `x` in the question is a vector not a table, so I don't know why would expect it will have the same syntax. And I don't have your table so I have no idea what you are talking about and can't reproduce it. – David Arenburg Jul 03 '17 at 21:15
  • The following solution is shorter https://stackoverflow.com/a/73911429/8806649 – Julien Sep 30 '22 at 16:14
  • @Julien your solution doesn't work in base R. In case your `lag` functions comes from another package, then you should both specify it and it's not shorter than my solution because it adds dependencies when it's unnecessary. Finally, please don't advertise your solutions under other peoples solutions. – David Arenburg Oct 01 '22 at 16:48
  • You're right, `lag` comes from `dplyr` – Julien Oct 01 '22 at 18:16
8
which(!!diff(as.numeric(x)))
[1] 3 5 6

The assumption is that you really have factors. They are saved internally with numerical values. So when the difference is taken, a one will result at every change. A second coercion is that zeroes are considered FALSE and other numbers TRUE. which locates the TRUE values aka non-zeroes.

Pierre L
  • 28,203
  • 6
  • 47
  • 69
5

rle can be used for this:

head(cumsum(rle(x)$lengths), -1)
[1] 3 5 6
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
  • I do like David's solution better... this does not actually work for a `factor` without converting to `numeric` or `character` first. – Matthew Lundberg Sep 09 '15 at 21:25
0

With the dplyr::lag function

library(dplyr) 
which(x != lag(x)) - 1
# [1] 3 5 6
Julien
  • 1,613
  • 1
  • 10
  • 26