0

Hopefully you guys can help me out. I've been looking all over the web, and I can't find an answer. Here's my data frame:

name    city    state   stars    main_category
A   Pittsburgh  PA       5.0     Soul Food
B   Houston     TX       3.0     Professional Services
C   Lafayette   IN       3.0     NA
D   Los Angeles CA       4.0     Local Services
E   Los Angeles CA       3.0     Local Services
F   Lafayette   IN       3.5     Mongolian
G   Pittsburgh  PA       5.0     Doctors
H   Pittsburgh  PA       4.0     Soul Food
I   Houston     TX       4.0     Professional Services

What I would like for it to do is to output the rank by grouping cities (alphabetically) with state and then rank by the amount of stars gotten. Here's what I was hoping for:

name    city    state   stars    main_category              rank
I   Houston     TX       4.0     Professional Services       1  
B   Houston     TX       3.0     Professional Services       2
F   Lafayette   IN       3.5     Mongolian                   1
D   Los Angeles CA       4.0     Local Services              1
E   Los Angeles CA       3.0     Local Services              2
G   Pittsburgh  PA       5.0     Doctors                     1
A   Pittsburgh  PA       5.0     Soul Food                   1
H   Pittsburgh  PA       4.0     Soul Food                   2

Here's my line of code.

l <- ddply(d, c("city", "state", "main_category"), na.rm=T, transform, rank=rank(-stars, ties.method="max"))

This does not remove the NA that Lafayette has. And I don't know what to put, I also tried na.omit, but when I tried that, the rank column does not show up.

3442
  • 8,248
  • 2
  • 19
  • 41
jason adams
  • 545
  • 2
  • 15
  • 30

3 Answers3

1

Here's a base R solution. Not sure if you're set on using dplyr, but this seems to work. I think the last row should be ranked 3 since there are two first values ranked at 1

no <- na.omit(dat)
new <- no[do.call(order, with(no, list(city, state, -stars))),]
within(new, {
    rank  <- Reduce(c, Map(rank, split(-stars, city), ties.method = "min"))
})
#   name        city state stars         main_category rank
# 9    I     Houston    TX   4.0 Professional Services    1
# 2    B     Houston    TX   3.0 Professional Services    2
# 6    F   Lafayette    IN   3.5             Mongolian    1
# 4    D Los Angeles    CA   4.0        Local Services    1
# 5    E Los Angeles    CA   3.0        Local Services    2
# 1    A  Pittsburgh    PA   5.0             Soul Food    1
# 7    G  Pittsburgh    PA   5.0               Doctors    1
# 8    H  Pittsburgh    PA   4.0             Soul Food    3
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
0

Using dplyr

library(dplyr)
filter(dat, complete.cases(dat)) %>%
                                group_by(city) %>% 
                                arrange(city, state, desc(stars)) %>% 
                                mutate(rank= min_rank(desc(stars)))
 #   name        city state stars         main_category rank
 #1    I     Houston    TX   4.0 Professional Services    1
 #2    B     Houston    TX   3.0 Professional Services    2
 #3    F   Lafayette    IN   3.5             Mongolian    1
 #4    D Los Angeles    CA   4.0        Local Services    1
 #5    E Los Angeles    CA   3.0        Local Services    2
 #6    A  Pittsburgh    PA   5.0             Soul Food    1
 #7    G  Pittsburgh    PA   5.0               Doctors    1
 #8    H  Pittsburgh    PA   4.0             Soul Food    3
akrun
  • 874,273
  • 37
  • 540
  • 662
0

na.rm with ddply goes inside .fun , in your case that'd be inside rank.

your approach to NA's was as follows:

ddply(d, c("city", "state", "main_category"), na.rm=T, transform, rank=rank(-stars, ties.method="max"))

Passing the argument inside .fun, should fix it. At least it works for me:

ddply(d, c("city", "state", "main_category"), transform, 
rank=rank(-stars, na.last = TRUE, ties.method="max"))
LRD
  • 1