0

I have the dataframe below which I would like to order primarily numerically by the count of times a value appears in the first column (bigger values first) and secondarily alphabetically(A-Z) based on the second column.

Name<-c("jack","jack","bob","david","mary")
Surname<-c("arf","dfg","hjk","dfgg","bn")
n1<-data.frame(Name, Surname)

It should be something like:

n1<-n1[
  order( n1[,1], n1[,2] ),
  ]

but I do not know how to order numerically based on count of values.

firmo23
  • 7,490
  • 2
  • 38
  • 114
  • So is `n1[order( n1[,1], n1[,2] ),]` your expected output? If so, since you want to *"order primarily numerically by the count of times a value appears in the first column"* why do the `"jack"` entries not appear at the top? – Maurits Evers Jul 08 '19 at 23:39
  • 1
    Pretty sure I've seen a similar question in the past, particularly the order by counts bit, but something like `n1[order(tabulate(n1$Name)[n1$Name], n1$Surname),]` should do it. – thelatemail Jul 08 '19 at 23:41
  • Possible duplicate of https://stackoverflow.com/questions/35036182/sort-a-dataframe-column-by-the-frequency-of-occurrence/35036797 and similar stuff here too - https://stackoverflow.com/questions/48334709/sort-by-occurance-or-freq-in-data-frame-in-r – thelatemail Jul 08 '19 at 23:44

3 Answers3

2

Using sqldf likes the following:

library(sqldf)

n2 <- sqldf('SELECT * FROM 
             n1 JOIN (SELECT Name, COUNT(*) as C FROM n1 GROUP BY Name) as T 
             on n1.Name = T.Name 
             ORDER BY C DESC, Surname')

First grouped the names and then sort based on the count in decent order and Surname alphabetically.

OmG
  • 18,337
  • 10
  • 57
  • 90
  • Very useful. In the code chunk above what would I change if I want to sort primarily alphabetically the the second column? – firmo23 Jul 08 '19 at 23:50
  • Indeed You should change the second part of `ORDER BY`. I mean you change the `Surname` in `ORDER BY` clause. – OmG Jul 08 '19 at 23:52
2

As suggested by @thelatemail, you can do this in base R using:

n1[order(-table(n1$Name)[n1$Name], n1$Surname), ]

To sort by surname first, swap the arguments to order() around.

Marius
  • 58,213
  • 16
  • 107
  • 105
  • Be careful that since you're indexing by names of a table that if `n1$Name` is numeric, you could get the wrong result. Compare `x <- c(1,2,4,4)` and `table(x)[x]` vs `tabulate(x)[x]`, or even `table(x)[as.character(x)]` – thelatemail Jul 08 '19 at 23:55
1

Using dplyr like the following:

library(dplyr)
n1 %>%
  as_tibble() %>%
  count(Name) %>%
  right_join(n1, by = "Name") %>% # join the count back to your original frame
  arrange(desc(n), Surname) %>% # order by highest count first, then Surname
  select(-n) # just return the original frame, with the sorting you asked for
lowndrul
  • 3,715
  • 7
  • 36
  • 54