How to use match in dplyr to order a column based on an external vector?

Question

I have a dataframe, df:

df <- data.frame(a = c("b2","d2","a1","c1"), b = c(12, 3, 54, 4))
> df
   a  b
1 b2 12
2 d2  3
3 a1 54
4 c1  4

And an external vector, that I would like the order of a to match:

vec <- c("a1","b2","c1","d2")

Normally I can do this as follows using match:

df <- df[match(vec, df$a),]

> df
   a  b
3 a1 54
1 b2 12
4 c1  4
2 d2  3

However, I would like to know if there is a way to do this in dplyr. I have tried the following, but it did not work:

df <- df %>%
    mutate(
        a = match(vec, a)
    )
> df
  a  b
1 3 12
2 1  3
3 4 54
4 2  4

Can anyone suggest where I'm going wrong in my code?

score 5 · Answer 1 · answered May 25 '21 at 08:24

You are only ordering in baseR, so equivalent dplyr syntax will be. Moreover, you'll have to reverse the arguments of match because here we want to have an index of column of df into the vector vec

df %>% arrange(match(a, vec))
   a  b
1 a1 54
2 b2 12
3 c1  4
4 d2  3

score 3 · Accepted Answer · answered May 25 '21 at 08:18

3

You can’t use mutate here because you want to modify the entire data.frame, not just a single column.

Instead, as of ‘dplyr’ v1.0.0, you can use summarize here:

df <- df %>% summarize(.[match(vec, a), ])

In this expression, . stands in for the entire data.frame.

answered May 25 '21 at 08:18

Konrad Rudolph

530,221
131
937
1,214

1

@Onyambu Why can’t `summarize` be used with duplicate values in `a` (besides discarding the duplicate value in `b`, but OP’s base R code does the same)? – Konrad Rudolph May 25 '21 at 08:26
1

Try using `df1 <- data.frame(a = df$a, b=1:8)` and see what happens – Onyambu May 25 '21 at 08:34
@Onyambu My code gives the same result as OP’s for your `df1`, since `match` doesn’t recycle the values of the first vector. – Konrad Rudolph May 25 '21 at 08:47

Onyambu · Answer 3 · 2021-05-25T08:33:58.307

3

You could do:

df %>%
  arrange(ordered(a, vec))

   a  b
1 a1 54
2 b2 12
3 c1  4
4 d2  3

edited May 25 '21 at 08:33

answered May 25 '21 at 08:20

Onyambu

67,392
3
24
53

1

This is a great solution (better than mine!) if `vec` is a permutation of `a` but it fails if is a subset of `a`. – Konrad Rudolph May 25 '21 at 08:22
@KonradRudolph ahy would `vec` be a subset? Then how exactly should the ones not in vec be ordered? That creates other difficulties – Onyambu May 25 '21 at 08:24
1

This is the way to go I think! You can actually simplify it even further to `df %>% arrange(ordered(a, vec))` – wurli May 25 '21 at 08:26

How to use match in dplyr to order a column based on an external vector?

3 Answers3