I'm attempting to sort a dataframe based on continuous variables between two categories of data. Specifically, I want to sort (descending) on the continuous variable, but preserving variables of a similar type together. Here's an example:
pets <- data.frame(animal = c("dog", "dog", "dog", "cat", "cat", "fish", "fish", "fish"),
breed = c("retriever", "husky", "husky", "grey", "white", "guppy", "betta", "betta"),
count = c(4, 3, 7, 8, 9, 2, 12, 1))
Right now, the data frame is unsorted. I want to sort it so that the breeds
with the highest mean count
appear first, but all breed
s (and animal
s) stay grouped together. If I order the frame based on breed
, I lose the correct order of the count
, and vice versa. Even when I try both like so:
pets[with(pets, order(breed, -count)), ]
the output is not sorted properly. I went through some split-apply-combine tutorials, but I could only find ones that attempted to keep one group of data together, not two as in my data.
Right now, this is the best I've got:
split_pets <- split(pets, pets$animal)
unlist(lapply(split_pets, function(x) sort(with(x, tapply(count, breed, mean)), decreasing = TRUE)))
Which returns
cat.white cat.grey dog.husky dog.retriever fish.betta fish.guppy
9.0 8.0 5.0 4.0 6.5 2.0
Sure, I've got the correct order they go in. But I don't actually even care about the means, I just need to sort the original data frame based on this. Next up is going down a rabbit hole of splitting this again based on breed, but then I'd be sorting based on a column of a data frame of a list of lists. That sounds way too complicated. I've also tried order
ing on count and then piping that to group_by()
from dplyr
, but that didn't get me any farther than I'm at now.
Thanks for any help!