How to create new rows based on data from a different table (R)

Question

So, if I had a data table like this:

stores <- read_csv("stores.csv")
stores

# A tibble: 6 x 3
  state      store   num_locations
  <chr>      <chr>           <dbl>
1 california target             20
2 california walmart            29
3 nevada     target             10
4 nevada     walmart            12
5 arizona    target             15
6 arizona    walmart            19

Then, I create a new data frame without the location information:

stores_2 <- select(stores, store, num_locations)

# A tibble: 6 x 2
  store   num_locations
  <chr>           <dbl>
1 target             20
2 walmart            29
3 target             10
4 walmart            12
5 target             15
6 walmart            19

Is there a way I can create a third data set that provides an average number of locations, like this (I'm not sure how to actually generate this tibble):

# A tibble: 6 x 2
  store   avg_num_locations
  <chr>           <dbl>
1 target             15
2 walmart            20

score 0 · Accepted Answer · answered Jun 15 '21 at 23:07

One solution is to use tidyverse functions group_by() and summarise():

library(tidyverse)

stores <- data.frame(
  stringsAsFactors = FALSE,
                       state = c("california","california","nevada","nevada","arizona",
                                 "arizona"),
                       store = c("target",
                                 "walmart","target","walmart","target",
                                 "walmart"),
     num_locations = c(20L, 29L, 10L, 12L, 15L, 19L)
          )

stores_summary <- stores %>%
  group_by(store) %>%
  summarise(avg_num_locations = mean(num_locations))

stores_summary
# A tibble: 2 x 2
#  store   avg_num_locations
#  <chr>               <dbl>
#1 target                 15
#2 walmart                20

You're welcome - for more details on tidyverse workflows check out https://r4ds.had.co.nz/ (free online, physical copy: http://amzn.to/2aHLAQ1) — jared_mamrot, Jun 15 '21 at 23:35

score 0 · Answer 2 · answered Jun 15 '21 at 23:17

0

In base R, you could use `aggregate:

aggregate(num_locations~store, stores, mean)
    store num_locations
1  target            15
2 walmart            20

answered Jun 15 '21 at 23:17

Onyambu

67,392
3
24
53

How to create new rows based on data from a different table (R)

2 Answers2