-1

I am an R newbie. This is my first question. I have a dataset containing 1) all US zip codes, 2) unique count of sales transactions, and 3) the sum of sales transactions. Is there a way to obtain the coefficient of determination (R^2) for every zip code using Count of Sales and Sum of Sales Transactions as my x and y variables, respectively? Specifically, I am looking to create a table with R^2s for every US zip code using the two variables mentioned.

David
  • 13
  • 2
  • Please consider reading up on [ask] and how to produce a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Heroka Oct 31 '15 at 07:13
  • Yes, there even are multiple ways. – Roland Oct 31 '15 at 09:29

1 Answers1

2

You can do this with the purrr package.

Here is an example with mtcars:

library(purrr)

mtcars %>%
  split(.$cyl) %>%
  map(~ lm(mpg ~ wt, data = .x)) %>%
  map(summary) %>%
  map_dbl("r.squared") %>% 
  data.frame(cyl = names(.), r2 = ., row.names = NULL)

         r2 cyl
1 0.5086326   4
2 0.4645102   6
3 0.4229655   8

And here is the flow for your problem, everything in "quotes" needs to be changed in your variables or dataframe, except for the "r.squared".

df <- "your dataframe" %>%
  split(.$"zipcode") %>%
  map(~ lm("sum of sales" ~ "count of sales", data = .x)) %>%
  map(summary) %>%
  map_dbl("r.squared") %>% 
  data.frame(zipcode = names(.), r2 = ., row.names = NULL)
phiver
  • 23,048
  • 14
  • 44
  • 56