Order a dataframe primarily by the count of times a value appears in a column and secondarily by a second column

Question

I have the dataframe below which I would like to order primarily numerically by the count of times a value appears in the first column (bigger values first) and secondarily alphabetically(A-Z) based on the second column.

Name<-c("jack","jack","bob","david","mary")
Surname<-c("arf","dfg","hjk","dfgg","bn")
n1<-data.frame(Name, Surname)

It should be something like:

n1<-n1[
  order( n1[,1], n1[,2] ),
  ]

but I do not know how to order numerically based on count of values.

So is `n1[order( n1[,1], n1[,2] ),]` your expected output? If so, since you want to *"order primarily numerically by the count of times a value appears in the first column"* why do the `"jack"` entries not appear at the top? — Maurits Evers, Jul 08 '19 at 23:39
Pretty sure I've seen a similar question in the past, particularly the order by counts bit, but something like `n1[order(tabulate(n1$Name)[n1$Name], n1$Surname),]` should do it. — thelatemail, Jul 08 '19 at 23:41
Possible duplicate of https://stackoverflow.com/questions/35036182/sort-a-dataframe-column-by-the-frequency-of-occurrence/35036797 and similar stuff here too - https://stackoverflow.com/questions/48334709/sort-by-occurance-or-freq-in-data-frame-in-r — thelatemail, Jul 08 '19 at 23:44

score 2 · Answer 1 · answered Jul 08 '19 at 23:39

2

Using sqldf likes the following:

library(sqldf)

n2 <- sqldf('SELECT * FROM 
             n1 JOIN (SELECT Name, COUNT(*) as C FROM n1 GROUP BY Name) as T 
             on n1.Name = T.Name 
             ORDER BY C DESC, Surname')

First grouped the names and then sort based on the count in decent order and Surname alphabetically.

answered Jul 08 '19 at 23:39

OmG

18,337
10
57
90

Very useful. In the code chunk above what would I change if I want to sort primarily alphabetically the the second column? – firmo23 Jul 08 '19 at 23:50
Indeed You should change the second part of `ORDER BY`. I mean you change the `Surname` in `ORDER BY` clause. – OmG Jul 08 '19 at 23:52

score 2 · Accepted Answer · answered Jul 08 '19 at 23:53

2

As suggested by @thelatemail, you can do this in base R using:

n1[order(-table(n1$Name)[n1$Name], n1$Surname), ]

To sort by surname first, swap the arguments to order() around.

answered Jul 08 '19 at 23:53

Marius

58,213
16
107
105

Be careful that since you're indexing by names of a table that if `n1$Name` is numeric, you could get the wrong result. Compare `x <- c(1,2,4,4)` and `table(x)[x]` vs `tabulate(x)[x]`, or even `table(x)[as.character(x)]` – thelatemail Jul 08 '19 at 23:55

lowndrul · Answer 3 · 2019-07-09T02:00:35.727

1

Using dplyr like the following:

library(dplyr)
n1 %>%
  as_tibble() %>%
  count(Name) %>%
  right_join(n1, by = "Name") %>% # join the count back to your original frame
  arrange(desc(n), Surname) %>% # order by highest count first, then Surname
  select(-n) # just return the original frame, with the sorting you asked for

edited Jul 09 '19 at 02:00

answered Jul 09 '19 at 00:13

lowndrul

3,715
7
36
54

2

FYI `count` is a shorthand for `group_by` plus `tally` plus `ungroup` – camille Jul 09 '19 at 00:57
@camille Thx! I edited my answer to reflect your comment. – lowndrul Jul 09 '19 at 02:01

Order a dataframe primarily by the count of times a value appears in a column and secondarily by a second column

3 Answers3