Visualising the distribution for different subgroups

Question

I'm using "d.pizza" data. There is variable called "delivery_min" which is delivery time (in minutes) and there is variable called "area" which can be one of three areas (Camden, Westminster and Brent). I want to draw a density plot that visualises the distribution of delivery time for these three areas.

I tried

 plot.ecdf(pizza_d$delivery_min)

this code works, but how can I do it for each area?

head(d.pizza)=

index       date week weekday        area count rabate  price operator  driver delivery_min
1 1     1 01.03.2014    9       6      Camden     5   TRUE 65.655   Rhonda  Taylor         20.0
2 2     2 01.03.2014    9       6 Westminster     2  FALSE 26.980   Rhonda Butcher         19.6
3 3     3 01.03.2014    9       6 Westminster     3  FALSE 40.970  Allanah Butcher         17.8
4 4     4 01.03.2014    9       6       Brent     2  FALSE 25.980  Allanah  Taylor         37.3
5 5     5 01.03.2014    9       6       Brent     5   TRUE 57.555   Rhonda  Carter         21.8
6 6     6 01.03.2014    9       6      Camden     1  FALSE 13.990  Allanah  Taylor         48.7
  temperature wine_ordered wine_delivered wrongpizza quality
1        53.0            0              0      FALSE  medium
2        56.4            0              0      FALSE    high
3        36.5            0              0      FALSE    <NA>
4          NA            0              0      FALSE    <NA>
5        50.0            0              0      FALSE  medium
6        27.0            0              0      FALSE     low

Hi, please take a look at [how to make a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Knowing your dataset's name and columns is helpful, but in order to provide a complete answer we'll need more than that. About the best I can do at the moment is suggest that you `filter` your dataset by your areas and plot those individually — Punintended, Oct 12 '20 at 21:00
@Punintended, i don't know how to filter my data to have only delivery times for "Brent" area (for example) — wrzosowa, Oct 12 '20 at 21:07

G5W · Answer 1 · 2020-10-12T21:35:06.303

2

library(DescTools)

data(d.pizza)
summary(d.pizza$delivery_min)

plot(NULL,ylab='',xlab='', xlim=c(5,66), ylim=0:1)
for(A in 1:3) {
    plot.ecdf(d.pizza$delivery_min[d.pizza$area == levels(d.pizza$area)[A]], 
        pch=20, col=A+1, add=T)
}
legend("bottomright", legend=levels(d.pizza$area), 
        bty='n', pch=20, col=2:4)

edited Oct 12 '20 at 21:35

answered Oct 12 '20 at 20:29

G5W

36,531
10
47
80

sorry, but I meant the distribution plot for these three areas (distribution function), sorry for mistake. – wrzosowa Oct 12 '20 at 20:51

score 2 · Accepted Answer · answered Oct 12 '20 at 21:13

2

You could do:

library(DescTools)

data(d.pizza)

plot.ecdf(subset(d.pizza, area == "Camden")$delivery_min, 
          col = "red", main = "ECDF for pizza deliveries")
plot.ecdf(subset(d.pizza, area == "Westminster")$delivery_min, 
          add = TRUE, col = "blue")
plot.ecdf(subset(d.pizza, area == "Brent")$delivery_min, 
          add = TRUE, col = "green")

answered Oct 12 '20 at 21:13

Allan Cameron

147,086
7
49
87

Thank you very much! – wrzosowa Oct 12 '20 at 21:31

lemonlin · Answer 3 · 2020-10-12T21:21:25.577

0

I'd recommend the ggplot2 library for data visualization in R. Here's some code using ggplot2 that can create a density plot with the three groups overlaid:

library(ggplot2)

# make example dataframe
d.pizza <- data.frame(delivery_min = rnorm(n=30), area = rep(c("Camden", "Westminster", "Brent"), 10))

# plot data in ggplot2
ggplot(d.pizza, aes(x = delivery_min, fill = area, color = area)) + geom_density(alpha = 0.5)

If you want a histogram, that can be done too:

ggplot(d.pizza, aes(x = delivery_min, fill = area, color = area)) + geom_histogram(alpha = 0.5, position = 'identity')

edited Oct 12 '20 at 21:21

answered Oct 12 '20 at 21:14

lemonlin

96
6

I think that you got the density function, not the _cumulative_ density function. – G5W Oct 12 '20 at 21:36

Visualising the distribution for different subgroups

3 Answers3