Is there a function with which I can find a combination of values in a data set, which has on average the largest values?

Question

These are the instructions that are given:

In 2012 and prior to that time, a pizza chain in Australia, Eagle Boys (taken over by Pizza Hut in 2016), ran an advertising campaign in which several claims about the size of their pizzas as well as those of their main competitor, Domino’s, were made. The file pizza.csv contains the data on which they based their campaign. For each of the 250 pizzas taken into consideration, you are provided with the chain the pizza comes from, the type of crust, toppings and the diameter of the pizza (in cm).

The question I have to answer is the following:

What combination of crust type, toppings and chain the pizza comes from has on average the largest pizzas? What combination yields the smallest pizzas?

These are the plots I managed to create, but I'm still only comparing two columns

that's the correlating code:

par(mfrow = c(1, 2))
boxplot(dominos$Diameter ~ dominos$CrustDescription)
boxplot(dominos$Diameter ~ dominos$Topping)

Welcome to SO! If you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of the `head()` of your data it will be much easier for people to help you. I think you'll get an answer fairly quickly to this one as it sounds straightforward, especially if you are allowed to use `dplyr` or `data.table`. — SamR, May 07 '22 at 09:15
With dplyr this question is asking for use of group_by and summarise, play around with those functions, get a bit stuck, figure it out and reap the benefits in the long term. — jpenzer, May 07 '22 at 11:08

score 0 · Answer 1 · answered May 07 '22 at 12:51

0

try this

library(tidyverse)

df<-read.csv("pizza.csv")
df %>% group_by(CrustDescription, Topping, Chain) %>%
summarize(avg = mean(Diameter))

hope it helps...

answered May 07 '22 at 12:51

lucabiel

56
5

1

thank you so so much, it worked! made my day :) – Maria Promegger May 07 '22 at 13:03

score 0 · Accepted Answer · answered May 07 '22 at 18:37

You probably have data similar to this

dat
#   chain crust topping diameter
# 1     Y     B       M 27.10686
# 2     X     C       L 29.70423
# 3     Y     A       L 27.57106
# 4     Y     A       L 27.88939
# 5     X     A       M 29.61035
# 6     X     C       K 29.77217

First, boxplot has a formula interface you may want to use.

boxplot(diameter ~ crust + topping + chain, dat)

Second, the same formula can be used in the actually very important aggregate function, which allows you to apply any FUNction to aggregated data.

a <- aggregate(diameter ~ crust + topping + chain, dat, FUN=mean)

In the second step you want those diameters that equal the max and the min.

a[a$diameter == max(a$diameter), ]
#   crust topping chain diameter
# 3     C       K     X 28.21241

a[a$diameter == min(a$diameter), ]
#    crust topping chain diameter
# 18     C       M     Y  26.6717

Data:

n <- 250
dat <- expand.grid(chain=LETTERS[24:25], crust=LETTERS[1:3], topping=LETTERS[11:13])  
dat <- dat[rep(seq_len(nrow(dat)), n/2), ]
set.seed(42)
dat$diameter <- runif(nrow(dat), 25, 30)
dat <- dat[sample(seq_len(nrow(dat)), n), ]

Is there a function with which I can find a combination of values in a data set, which has on average the largest values?

2 Answers2