4

Comming from SQL i would expect i was able to do something like the following in dplyr, is this possible?

# R
tbl %>% mutate(n = dense_rank(Name, Email))

-- SQL
SELECT Name, Email, DENSE_RANK() OVER (ORDER BY Name, Email) AS n FROM tbl

Also is there an equivilant for PARTITION BY?

CodeMonkey
  • 3,418
  • 4
  • 30
  • 53
  • 1
    like this `mtcars %>% mutate(n = dense_rank(interaction(cyl, hp)))`? – talat Jan 19 '18 at 09:05
  • @docendodiscimus thats awesome, had completly forgotten about `interaction()` i had hashed the values but that messes up the order. Is there any easy solution for using `PARTITION BY`? – CodeMonkey Jan 19 '18 at 10:31
  • I don't know what that does – talat Jan 19 '18 at 11:12
  • @CodeMonkey - If you're thinking `PARTITION BY` to get a ranking by group, you would use a `group by` in dplyr. https://stackoverflow.com/questions/34967837/rank-variable-by-group-dplyr – Jason Jan 19 '18 at 22:06
  • @Jason Awesome! its working. Using interaction with `lex.order` one can almost simulate the OVER(ORDER BY) and group_by works like a charm. Thanks! – CodeMonkey Jan 22 '18 at 08:13

1 Answers1

0

I did struggle with this problem and here is my solution:

In case you can't find any function which supports ordering by multiple variables, I suggest that you concatenate them by their priority level from left to right using paste().

Below is the code sample:

tbl %>%
  mutate(n = dense_rank(paste(Name, Email))) %>%
  arrange(Name, Email) %>%
  view()

Moreover, I guess group_by is the equivalent for PARTITION BY in SQL.

The shortfall for this solution is that you can only order by 2 (or more) variables which have the same direction. In the case that you need to order by multiple columns which have different direction, saying that 1 asc and 1 desc, I suggest you to try this: Calculate rank with ties based on more than one variable

Bảo Trần
  • 130
  • 1
  • 8