0

Lets assume we have the following table with different categories from a data base.

id <- c('id1','id2','id3','id4','id5')
visit_number <- c(3, 8, 10, 7,8)
page_views<- c(35, 85, 102, 7,18)
channel <-c('Natural Search', 'Paid Search', 'Social Network', 'Paid Search', 'Social Network')
country <- c('ES','ES','NL','DE','DE')
device <- c('PC','PC','Other','PC','Other')
sex <- c('F','F','M','M','M')
age <- c('18-30','18-30','18-30','>50','<40')
product <-c('SRSX11W.MX', 'SRSX11W.MX', 'DSC-HX400V //SF-8C4', 'SLT-A58Y//Q E38', 'DSC-HX400V + LCS-U21 + SF-8C4')
producy_category <- c('audio','audio','camera','tablets','camera')
 price <- c(33.45, 82, 104.44, 72,88.45)
 # Join the variables to create a data frame
df <-data.frame(id,visit_number,page_views,channel,country,device,sex,age,product,producy_category,price)
df

I would like to calculate the top-performing combinations based on countries, channels, and product categories. The table is much bigger. what kind of combinations should I do to predict this?

s_baldur
  • 29,441
  • 4
  • 36
  • 69
user3683485
  • 115
  • 1
  • 8
  • Please provide minimal and reproducible example(s) along with the desired output. – s_baldur Oct 06 '20 at 10:07
  • It's hard for anyone to help you with the information you've given us. Please have a read of [this post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and edit your question. Thanks. – user438383 Oct 06 '20 at 10:09
  • thank you i edited the question with a reproducible sample – user3683485 Oct 06 '20 at 10:52

1 Answers1

1

We can use tidyverse functions to create all combinations of country, channel, producy_category and calculate sum of price. We can arrange the output in descending order to get top-performing combinations.

library(tidyverse)

df %>%
  distinct(country, channel, producy_category) %>%
  do.call(crossing, .) %>%
  pmap_df(~ df %>% 
         filter(channel == ..1, country == ..2, producy_category == ..3) %>%
         group_by(channel, country, producy_category) %>%
         summarise(price = sum(price))) %>%
  arrange(desc(price)) -> result
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • It doesn't seem to work. I have an error like : Error in Ops.factor(channel, ..1) : level sets of factors are different – user3683485 Oct 07 '20 at 11:11
  • @user3683485 It seems you are on R < 4.0.0. Turn your data to character or use `df <-data.frame(id,visit_number,page_views,channel,country,device,sex,age,product,producy_category,price, stringsAsFactors = FALSE)` – Ronak Shah Oct 07 '20 at 11:34
  • I still have the same error : Error in Ops.factor(clean_test.channel, ..1) : level sets of factors are different when i am trying to do the same in the big table . `df<- data.frame(clean_test$country,clean_test$channel,clean_test$product_category,clean_test$price, stringsAsFactors = FALSE) colnames(df) <- c("country", "channel", "producy_category", "price")` – user3683485 Oct 07 '20 at 11:52
  • You did not convert the columns to characters. What does `sapply(df, class)` return? Try `df <- type.convert(df, as.is = TRUE)` – Ronak Shah Oct 07 '20 at 11:54
  • Ok the last one works i understand but the result is empty ! without data available in table – user3683485 Oct 07 '20 at 11:58
  • Did you check `result` ? It atleast give few rows as output for the data you have shared. Does it work for you on that data? – Ronak Shah Oct 07 '20 at 12:08
  • Yes i checked it in my case the resut returns just headers even in the data i shared. A tibble: 0 x 4 # Groups: channel, country [0] # ... with 4 variables: channel , country , producy_category , price – user3683485 Oct 07 '20 at 12:11
  • That's weird because when I use the data that you have shared it gives me tibble with 5 rows and 4 columns. – Ronak Shah Oct 07 '20 at 12:35
  • Very weird. i run again and again the example but always 0 obs of 4 variables in the result ! of course the same for the big table ! but after conversion there is no error in the function that you propose. – user3683485 Oct 07 '20 at 12:40
  • What is your R version? Can you restart R and try again? – Ronak Shah Oct 07 '20 at 12:48
  • R version 3.6.2 . i did it restart and run again but still nothing. libraries work, functions as well but the result is 0. – user3683485 Oct 07 '20 at 13:00
  • I am pretty sure this is package version problem . My `packageVersion('tidyr')` is `‘1.1.2’` and `packageVersion('dplyr')` is `‘1.0.1’` – Ronak Shah Oct 07 '20 at 13:33
  • In my case, the tidyr is the same but the dplyr is ‘0.8.5’ and i am not able to update it. i suppose i have to update R ? – user3683485 Oct 07 '20 at 13:46
  • Yes, I think first you need to update R and then try updating other packages. – Ronak Shah Oct 07 '20 at 13:59
  • Is there any other way to do those calculations with another function or library? – user3683485 Oct 07 '20 at 14:06
  • There might be but I cannot test it if it works or not since my packages and R are all updated. – Ronak Shah Oct 07 '20 at 14:08
  • I don't understand how the packageVersion('dplyr') is responsible for an empty table since the functions and inputs outputs are correct and work. – user3683485 Oct 07 '20 at 14:51
  • Ok after updating R and the packages i managed to run it for the example but not for the big table. If the big dataset is a df with 4 columns like country, channel,producy_category, price this should work right ? – user3683485 Oct 07 '20 at 15:23
  • Thanks for the help i did some changes in the data set and it worked! – user3683485 Oct 07 '20 at 15:29