I have a dataset with two columns, Species and Color:
Species Color
daisy white
daisy yellow
iris purple
iris purple
iris purple
tulip red
tulip red
…etc
Using dplyr(count) I summarize the number of color observations per species:
data %>%
count(Species, Color)
Species Color n
daisy white 1
daisy yellow 1
iris purple 3
tulip red 2
tulip yellow 4
tulip pink 2
I would like to add a column that shows the proportion of each color by species (n per color/total n per species):
Species Color n proportion
daisy white 1 0.5
daisy yellow 1 0.5
iris purple 3 1
tulip red 2 0.25
tulip yellow 4 0.5
tulip pink 2 0.25