How can I summarize, count and then rank the results using R?

Question

I have created a data set of single family home sales in my county from 2022. For each ZIP code, I am trying to find the number of sales and total sales, so that I can calculate the average sale. I would then like to rank the ZIP codes by average sale, in decreasing order.

summary <- sales %>% group_by(distinct(zip_code, sold_price))
## ! no applicable method for 'distinct' applied to an object of class "c('double', 'numeric')"
## Run `rlang::last_trace()` to see where the error occurred.

summary <- sales %>% group_by(zip_code, sold_price)

The second statement works but seems to just return the original data set.

Perhaps you are looking for `sales %>% distinct(zip_code, sold_price, .keep_all = TRUE)` or `sales %>% reframe(count = n(), total_sales = sum(sold_price, na.rm = TRUE), .by = zip_code)` — akrun, Mar 27 '23 at 16:26

score 0 · Answer 1 · answered Mar 27 '23 at 16:32

0

library(dplyr) #version >= 1.1.0
sales %>% 
  reframe(count = n(), total_sales = sum(sold_price,  na.rm = TRUE),
   .by = zip_code)

answered Mar 27 '23 at 16:32

akrun

874,273
37
540
662

score 0 · Accepted Answer · answered Mar 28 '23 at 07:51

I believe you have two issues: one is not understanding how group_by works (I had that same issue before!) and the second is that you're trying to use distinct in a context that was not designed for.

distinct is used to filter all the unique rows in a dataframe, filtering out any row that has exactly the same values in every column or in selected column(s). You can read how it works in this page from the official documentation.

The problem with group_by is that it doesn't do any computation: it "just" groups data so you can do something else with it (i.e. summarise it, sum it, count it...), so it is aimed to be used in conjunction with other verbs such as summarise, count, tally... You can see a great explanation of how grouping works in this vignette from dplyr's documentation.

So, answering your question now, what I believe you may want to do would be this:

Option 1: Group first, then sum their unique values, and then arrange by descending order

library(dplyr)

starwars %>% 
  group_by(species) %>% 
  summarise(total = n()) %>% 
  arrange(desc(total))

Or, more succintly, use count (more info in this page from official documentation)

starwars %>% 
  count(species, sort = TRUE)

Both yield the same results.

PS: I used the starwars dataframe from dplyr because you didn't provide any data that I could reproduce. For further questions, you may want to add a reproducible example by adding data to your question. You can read this excellent guide here: How to make a great R reproducible example

Thank you so much for the explanation and examples! This helps so much :) — Sheena Puckett, Mar 30 '23 at 16:02

How can I summarize, count and then rank the results using R?

2 Answers2