I am using ggplot(). I have a dataset that has a variable called "state_crossing"
which takes multiple string values - like "baja california", "sonora" and "tamaulipas". I saw some tricks on other threads on how to make multiple lines without needing to call a geom_line()
for each variable (e.g., here Plotting two variables as lines using ggplot2 on the same graph).
Then, I created a new variable value = 1
for each observation, and I am using the following command:
plotting <- data.frame(
year = rep(c("2001", "2002"), times=c(4,5)),
state_crossing = c("baja california", "baja california", "sonora", "tamaulipas", "sonora", "sonora", "tamaulipas", "tamaulipas", "baja california"),
value = rep(1, 9)
)
ggplot(plotting, aes(x=year, y=value)) +
geom_line(aes(color=state_crossing, group=state_crossing), stat = "summary", fun = "sum")
This is great, but it naturally plots the sum of occurrences, whereas I wanted the fraction of observations with that value of state_crossing within each value of x = year
. The function "mean" doesn't work, as the mean is equal to 1 for the variable. Any idea of a "fun"
that could give me the fraction?
E.g., on my reproducible example, I'd like "baja california" to show 2/3 in 2001, and then 0 in 2002; and "sonora" to show up as "1/3" and then "1/2"