How to remove na values from data frame using ddply?

Question

Hopefully you guys can help me out. I've been looking all over the web, and I can't find an answer. Here's my data frame:

name    city    state   stars    main_category
A   Pittsburgh  PA       5.0     Soul Food
B   Houston     TX       3.0     Professional Services
C   Lafayette   IN       3.0     NA
D   Los Angeles CA       4.0     Local Services
E   Los Angeles CA       3.0     Local Services
F   Lafayette   IN       3.5     Mongolian
G   Pittsburgh  PA       5.0     Doctors
H   Pittsburgh  PA       4.0     Soul Food
I   Houston     TX       4.0     Professional Services

What I would like for it to do is to output the rank by grouping cities (alphabetically) with state and then rank by the amount of stars gotten. Here's what I was hoping for:

name    city    state   stars    main_category              rank
I   Houston     TX       4.0     Professional Services       1  
B   Houston     TX       3.0     Professional Services       2
F   Lafayette   IN       3.5     Mongolian                   1
D   Los Angeles CA       4.0     Local Services              1
E   Los Angeles CA       3.0     Local Services              2
G   Pittsburgh  PA       5.0     Doctors                     1
A   Pittsburgh  PA       5.0     Soul Food                   1
H   Pittsburgh  PA       4.0     Soul Food                   2

Here's my line of code.

l <- ddply(d, c("city", "state", "main_category"), na.rm=T, transform, rank=rank(-stars, ties.method="max"))

This does not remove the NA that Lafayette has. And I don't know what to put, I also tried na.omit, but when I tried that, the rank column does not show up.

1) make a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). 2) Barring that, try this - `ddply(na.omit(d), ...)` — Chase, Nov 03 '14 at 02:48
But Houston didn't get 5 stars. I'm confused about your output — Rich Scriven, Nov 03 '14 at 02:55
@Chase I tried na.omit(d) and this is what I get: Error: attempt to apply non-function — jason adams, Nov 03 '14 at 02:56
@RicharScriven I was hoping to have a ranking within the categories. so each category would have its own ranking provided that it is in the same city, state. — jason adams, Nov 03 '14 at 03:41

Rich Scriven · Answer 1 · 2014-11-03T04:12:12.623

Here's a base R solution. Not sure if you're set on using dplyr, but this seems to work. I think the last row should be ranked 3 since there are two first values ranked at 1

no <- na.omit(dat)
new <- no[do.call(order, with(no, list(city, state, -stars))),]
within(new, {
    rank  <- Reduce(c, Map(rank, split(-stars, city), ties.method = "min"))
})
#   name        city state stars         main_category rank
# 9    I     Houston    TX   4.0 Professional Services    1
# 2    B     Houston    TX   3.0 Professional Services    2
# 6    F   Lafayette    IN   3.5             Mongolian    1
# 4    D Los Angeles    CA   4.0        Local Services    1
# 5    E Los Angeles    CA   3.0        Local Services    2
# 1    A  Pittsburgh    PA   5.0             Soul Food    1
# 7    G  Pittsburgh    PA   5.0               Doctors    1
# 8    H  Pittsburgh    PA   4.0             Soul Food    3

score 0 · Answer 2 · answered Nov 03 '14 at 07:05

Using dplyr

library(dplyr)
filter(dat, complete.cases(dat)) %>%
                                group_by(city) %>% 
                                arrange(city, state, desc(stars)) %>% 
                                mutate(rank= min_rank(desc(stars)))
 #   name        city state stars         main_category rank
 #1    I     Houston    TX   4.0 Professional Services    1
 #2    B     Houston    TX   3.0 Professional Services    2
 #3    F   Lafayette    IN   3.5             Mongolian    1
 #4    D Los Angeles    CA   4.0        Local Services    1
 #5    E Los Angeles    CA   3.0        Local Services    2
 #6    A  Pittsburgh    PA   5.0             Soul Food    1
 #7    G  Pittsburgh    PA   5.0               Doctors    1
 #8    H  Pittsburgh    PA   4.0             Soul Food    3

score 0 · Answer 3 · answered Dec 14 '17 at 15:00

na.rm with ddply goes inside .fun , in your case that'd be inside rank.

your approach to NA's was as follows:

ddply(d, c("city", "state", "main_category"), na.rm=T, transform, rank=rank(-stars, ties.method="max"))

Passing the argument inside .fun, should fix it. At least it works for me:

ddply(d, c("city", "state", "main_category"), transform, 
rank=rank(-stars, na.last = TRUE, ties.method="max"))

How to remove na values from data frame using ddply?

3 Answers3