I am an R newbie. This is my first question. I have a dataset containing 1) all US zip codes, 2) unique count of sales transactions, and 3) the sum of sales transactions. Is there a way to obtain the coefficient of determination (R^2) for every zip code using Count of Sales and Sum of Sales Transactions as my x and y variables, respectively? Specifically, I am looking to create a table with R^2s for every US zip code using the two variables mentioned.
Asked
Active
Viewed 69 times
1 Answers
2
You can do this with the purrr package.
Here is an example with mtcars:
library(purrr)
mtcars %>%
split(.$cyl) %>%
map(~ lm(mpg ~ wt, data = .x)) %>%
map(summary) %>%
map_dbl("r.squared") %>%
data.frame(cyl = names(.), r2 = ., row.names = NULL)
r2 cyl
1 0.5086326 4
2 0.4645102 6
3 0.4229655 8
And here is the flow for your problem, everything in "quotes" needs to be changed in your variables or dataframe, except for the "r.squared".
df <- "your dataframe" %>%
split(.$"zipcode") %>%
map(~ lm("sum of sales" ~ "count of sales", data = .x)) %>%
map(summary) %>%
map_dbl("r.squared") %>%
data.frame(zipcode = names(.), r2 = ., row.names = NULL)

phiver
- 23,048
- 14
- 44
- 56