2

I'd like to summarise sales data grouping by city and adding a column where "1" means "the city has at least one store that sells more than 100" and "0" means "otherwise".

This is an approximation of what I'm trying:

library(dplyr)

my_data <- tibble(city = c("City 1","City 1","City 2","City 1"), 
                  store = c("mc donalds","starbucks","target","jp licks"), 
                  sales = c(300,200,3000,80),
                  sales_higher_than_100 = c(1,1,1,0))

my_data %>% 
  group_by(city) %>% 
  summarise(has_stores_that_sell_more_than_100 = sum(sales_higher_than_100))

# A tibble: 2 x 2
    city has_stores_that_sell_more_than_100
   <chr>                              <dbl>
1 City 1                                  2
2 City 2                                  1

However, instead of summation I'd like to report a value of 1 for "City 1" like this:

# A tibble: 2 x 2
    city has_stores_that_sell_more_than_100
   <chr>                              <dbl>
1 City 1                                  1
2 City 2                                  1

In other words, I wonder how to instruct dplyr to find if one or more of the rows of City N meet a condition instead of counting each of the rows that meet the condition for City N.

pachadotdev
  • 3,345
  • 6
  • 33
  • 60

2 Answers2

2

Could put a logical in there ">0" and then convert it to a numeric whereby TRUE = 1 and FALSE = 0

my_data %>% 
group_by(city) %>% 
summarise(has_stores_that_sell_more_than_100 = as.numeric(sum(sales_higher_than_100)>0))
BigTimeStats
  • 447
  • 3
  • 12
2

Using ifelse :

my_data %>%
  mutate(res = ifelse(sales>= 100), 1,
               ifelse(sales<100, 0, NA)))

Using if_else:

my_data %>%
      mutate(res = if_else(sales>= 100), 1,
                   if_else(sales<100, 0, NA_real_)))

Using basic r:
after you have final dataframe with the sales values dataframe,
my_data$newcolumn<- my_data$sales[my_data$sales >= 100] <- 1

Using case_when:

my_data %>%
  select(name:sales) %>%
  mutate(
    type = case_when(
      sales > 100 ~ "True",
      TRUE ~ "False"
    )
  )

See here : http://dplyr.tidyverse.org/reference/case_when.html

Using derived factor:

library(dplyr)
library(mosaic)
#if salesvolume is the column that has number of sales 
my_data <- mutate(my_data, res= as.numeric(derivedFactor(
     "1" = (salesvolume >= 100)),
     "0" = (salesvolume < 100),
     .method = "first",
     .default = NA
     )))

see : can dplyr package be used for conditional mutating?

kRazzy R
  • 1,561
  • 1
  • 16
  • 44