0

I have a dataframe, df:

df <- data.frame(a = c("b2","d2","a1","c1"), b = c(12, 3, 54, 4))
> df
   a  b
1 b2 12
2 d2  3
3 a1 54
4 c1  4

And an external vector, that I would like the order of a to match:

vec <- c("a1","b2","c1","d2")

Normally I can do this as follows using match:

df <- df[match(vec, df$a),]

> df
   a  b
3 a1 54
1 b2 12
4 c1  4
2 d2  3

However, I would like to know if there is a way to do this in dplyr. I have tried the following, but it did not work:

df <- df %>%
    mutate(
        a = match(vec, a)
    )
> df
  a  b
1 3 12
2 1  3
3 4 54
4 2  4

Can anyone suggest where I'm going wrong in my code?

icedcoffee
  • 935
  • 1
  • 6
  • 18

3 Answers3

5

You are only ordering in baseR, so equivalent dplyr syntax will be. Moreover, you'll have to reverse the arguments of match because here we want to have an index of column of df into the vector vec

df %>% arrange(match(a, vec))
   a  b
1 a1 54
2 b2 12
3 c1  4
4 d2  3
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
3

You can’t use mutate here because you want to modify the entire data.frame, not just a single column.

Instead, as of ‘dplyr’ v1.0.0, you can use summarize here:

df <- df %>% summarize(.[match(vec, a), ])

In this expression, . stands in for the entire data.frame.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • 1
    @Onyambu Why can’t `summarize` be used with duplicate values in `a` (besides discarding the duplicate value in `b`, but OP’s base R code does the same)? – Konrad Rudolph May 25 '21 at 08:26
  • 1
    Try using `df1 <- data.frame(a = df$a, b=1:8)` and see what happens – Onyambu May 25 '21 at 08:34
  • @Onyambu My code gives the same result as OP’s for your `df1`, since `match` doesn’t recycle the values of the first vector. – Konrad Rudolph May 25 '21 at 08:47
3

You could do:

df %>%
  arrange(ordered(a, vec))

   a  b
1 a1 54
2 b2 12
3 c1  4
4 d2  3
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • 1
    This is a great solution (better than mine!) if `vec` is a permutation of `a` but it fails if is a subset of `a`. – Konrad Rudolph May 25 '21 at 08:22
  • @KonradRudolph ahy would `vec` be a subset? Then how exactly should the ones not in vec be ordered? That creates other difficulties – Onyambu May 25 '21 at 08:24
  • 1
    This is the way to go I think! You can actually simplify it even further to `df %>% arrange(ordered(a, vec))` – wurli May 25 '21 at 08:26