0

I have a dataset with two columns, Species and Color:

Species Color
daisy   white
daisy   yellow
iris    purple
iris    purple
iris    purple
tulip   red
tulip   red
…etc

Using dplyr(count) I summarize the number of color observations per species:

data %>%                               
  count(Species, Color)


Species Color   n
daisy   white   1
daisy   yellow  1
iris    purple  3
tulip   red     2
tulip   yellow  4
tulip   pink    2

I would like to add a column that shows the proportion of each color by species (n per color/total n per species):

Species Color   n   proportion
daisy   white   1   0.5
daisy   yellow  1   0.5
iris    purple  3   1
tulip   red     2   0.25
tulip   yellow  4   0.5
tulip   pink    2   0.25
Kate71
  • 93
  • 5

1 Answers1

1

You can use the following code:

library(dplyr)
data %>%
  group_by(Species, Color) %>%
  summarise(n = n()) %>%
  mutate(proportion = n / sum(n))

Output:

# A tibble: 4 × 4
# Groups:   Species [3]
  Species Color      n proportion
  <chr>   <chr>  <int>      <dbl>
1 daisy   white      1        0.5
2 daisy   yellow     1        0.5
3 iris    purple     3        1  
4 tulip   red        2        1 

Data

data <- data.frame(Species = c("daisy", "daisy", "iris", "iris", "iris", "tulip", "tulip"),
                   Color = c("white", "yellow", "purple", "purple", "purple", "red", "red"))
Quinten
  • 35,235
  • 5
  • 20
  • 53