How to order a column by group in R

Question

I have a data.frame (say "df") looks like following:

Hospital.Name | State | Mortality.Rate
'hospital_1'   | 'AA'  | 0.2
'hospital_2'   | 'AA'   | 0.3
'hospital_3'   | 'BB'  | 0.3
'hospital_4'   | 'CC'  | 0.5

(The Hospital.Name is unique)

Now I want to order the "Mortality.Rate" group by "State", i.e. order the rate within a certain state. If there is a tie in the rate, then "Hospital.Name" is used for resolve the tie.

The "order()" and "tapply()" functions came to my mind. I coded like this:

tapply(df$Mortality.Rate, df$State, order, df$Hospital.Name, na.last=NA)

However, an error "argument length differ" popped up. When "order" function is applied to a sliced "Rate", the second argument of order (i.e. df$Hospital.Name) is not sliced.

How could I pass the second argument (for resolution a tie in ordering) to tapply() or is there any other approaches?

Jthorpe · Accepted Answer · 2015-02-21T19:12:47.043

15

In base R, you can supply multiple arguments to order() and subsequent arguments are used to break ties in the earlier variables, as in:

df[order(df$State,df$Mortality.Rate,df$Hospital.Name),]

edited Feb 21 '15 at 19:12

answered Feb 21 '15 at 18:40

Jthorpe

9,756
2
49
64

1

No need for the quotes around the `Mortality.Rate`. – Konrad Rudolph Feb 21 '15 at 18:42
Where's the `Hospital.Name` part? – David Arenburg Feb 21 '15 at 18:53
@Jthorpe True, but it’s a form of [cargo cult programming](http://en.wikipedia.org/wiki/Cargo_cult_programming). – Konrad Rudolph Feb 21 '15 at 18:59
@KonradRudolph I was just being lazy (not removing the quotes I copy/pasted from the OP), not trying to solve a problem with quotes... – Jthorpe Feb 21 '15 at 19:18
It took me a while to understand. It is a nice and clean resolution. Thank you. – Zelong Feb 21 '15 at 19:23

score 10 · Answer 2 · answered Feb 21 '15 at 18:40

10

you can do it in dplyr:

df %>% group_by(State) %>% arrange(Mortality.Rate, Hospital.Name)

answered Feb 21 '15 at 18:40

jalapic

13,792
8
57
87

Thanks a lot. But I need to stick to base R when finding the resolution (sorry about not mentioning it in my question). I will have a look at this package. Thanks. – Zelong Feb 21 '15 at 19:24

Lincoln Mullen · Answer 3 · 2015-02-21T18:54:00.357

4

You can do this in dplyr. First, some sample data:

library("dplyr")
hospital_name <- sample(c("hospital_1", "hospital_2", "hospital_3"), 10,
                        replace = TRUE)
state <- sample(letters[1:3], 10, replace = TRUE)
mortality_rate <- runif(10)

df <- data_frame(hospital_name, state, mortality_rate)

Group by state, then arrange by columns.

df %>% 
  group_by(state) %>% 
  arrange(mortality_rate, hospital_name)

Producing results like these, where the states are grouped and the mortality rate is sorted within each state.

## Source: local data frame [10 x 3]
## Groups: state
## 
##    hospital_name state mortality_rate
## 1     hospital_1     b     0.15293591
## 2     hospital_1     b     0.37417167
## 3     hospital_1     b     0.54561856
## 4     hospital_3     c     0.02487033
## 5     hospital_1     c     0.09937557
## 6     hospital_1     c     0.35666087
## 7     hospital_3     c     0.39663460
## 8     hospital_2     c     0.53064144
## 9     hospital_3     c     0.76015632
## 10    hospital_3     c     0.76801890

Without group_by() you just get the mortality rates from least to greatest:

df %>%
  arrange(mortality_rate)

## Source: local data frame [10 x 3]
## 
##    hospital_name state mortality_rate
## 1     hospital_3     c     0.02487033
## 2     hospital_1     c     0.09937557
## 3     hospital_1     b     0.15293591
## 4     hospital_1     c     0.35666087
## 5     hospital_1     b     0.37417167
## 6     hospital_3     c     0.39663460
## 7     hospital_2     c     0.53064144
## 8     hospital_1     b     0.54561856
## 9     hospital_3     c     0.76015632
## 10    hospital_3     c     0.76801890

edited Feb 21 '15 at 18:54

answered Feb 21 '15 at 18:43

Lincoln Mullen

6,257
4
27
30

2

Here also the answer is similar to @jalapic. I don't know whether the group_by is needed here `arrange(df, State, Hospital.Name, Mortality.Rate)` – akrun Feb 21 '15 at 18:48
Yes, the `group_by` is needed to sort within states, rather than within the data frame as a whole. See `?dplyr::group_by`. – Lincoln Mullen Feb 21 '15 at 18:50
1

Can you show some examples where this will differ. I tried your example with a `set.seed(24)`. Got the same output with or without groupby – akrun Feb 21 '15 at 18:50
Edited the answer as you suggest. – Lincoln Mullen Feb 21 '15 at 18:54
My code was `arrange(df, state, hospital_name, mortality_rate)` – akrun Feb 21 '15 at 18:54
Yes, that will also work for sorting. But using `group_by()` is a better match conceptually for what the question is asking, and permits further analysis, such as taking the top n within a grouping. – Lincoln Mullen Feb 21 '15 at 18:56
1

I thought using only `arrange` would be faster if the OP needs just to order – akrun Feb 21 '15 at 18:57

score 3 · Answer 4 · answered Feb 21 '15 at 19:10

If we already in loading needles (for this specific operation) packages, here's a package (data.table) that could be useful in a sense of sorting the data by reference (without copying it and the need of using <-) using the setorder or setkey functions

library(data.table)
setorder(setDT(df), State, Mortality.Rate, Hospital.Name)

Though, you could potentially mimic base R syntax and order the data while creating a copy (though with improved speed because data.table calls its forder under the hood)

setDT(df)[order(State, Mortality.Rate, Hospital.Name)]

how to order but imposing a group column as reference? – Paulo E. Cardoso Dec 20 '17 at 23:51 — Paulo E. Cardoso, Dec 20 '17 at 23:51

score 1 · Answer 5 · edited May 23 '17 at 12:17

1

This came to my mind

 df <- df[with(df, order(State, as.numeric(Mortality.Rate), Hospital.Name)]

Check out this post How to sort a dataframe by column(s)?

edited May 23 '17 at 12:17

Community

1
1

answered Feb 21 '15 at 18:45

Michael Kaiser

133
1
9

1

Isn't this similar to @Jthorpe's answer – akrun Feb 21 '15 at 18:47
Like akrun said, also, where's the `Hospital.Name` part? – David Arenburg Feb 21 '15 at 19:05

score 0 · Answer 6 · edited Jul 21 '22 at 05:14

0

assign a variable "result". and also assuming you want to find the avg mortality for each state

result <- df %<%
                 arrange(Mortality.Rate) %<%
                 order_by(State) %<%
                 summarize(mean(Mortality.Rate)
view(result)

edited Jul 21 '22 at 05:14

Waldi

39,242
6
30
78

answered Jul 19 '22 at 10:08

Emeka

1

How to order a column by group in R

6 Answers6

assign a variable "result". and also assuming you want to find the avg mortality for each state

Linked