0

I am using ggplot(). I have a dataset that has a variable called "state_crossing" which takes multiple string values - like "baja california", "sonora" and "tamaulipas". I saw some tricks on other threads on how to make multiple lines without needing to call a geom_line() for each variable (e.g., here Plotting two variables as lines using ggplot2 on the same graph).

Then, I created a new variable value = 1 for each observation, and I am using the following command:

plotting <- data.frame(
  year = rep(c("2001", "2002"), times=c(4,5)), 
  state_crossing = c("baja california", "baja california", "sonora", "tamaulipas", "sonora", "sonora", "tamaulipas", "tamaulipas", "baja california"), 
  value = rep(1, 9)
)

ggplot(plotting, aes(x=year, y=value)) +
  geom_line(aes(color=state_crossing, group=state_crossing), stat = "summary", fun = "sum")

This is great, but it naturally plots the sum of occurrences, whereas I wanted the fraction of observations with that value of state_crossing within each value of x = year. The function "mean" doesn't work, as the mean is equal to 1 for the variable. Any idea of a "fun" that could give me the fraction?

E.g., on my reproducible example, I'd like "baja california" to show 2/3 in 2001, and then 0 in 2002; and "sonora" to show up as "1/3" and then "1/2"

jpugliese
  • 261
  • 1
  • 11
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Aug 17 '21 at 21:38
  • 1
    I thought this was straightforward enough to not require one, sorry about that. Will add some data. – jpugliese Aug 17 '21 at 21:56
  • The example data provided doesn't produce a plot at all for me. I get the error "geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?" – MrFlick Aug 18 '21 at 00:51
  • Sorry, try again. For some reason it needed a group argument. In the same data structure with more obs it wasn't giving this error – jpugliese Aug 18 '21 at 00:57

1 Answers1

1

Trying to summarize data with ggplot can be a bit of a headache. It's great just for plotting data. So its easiest if you summarize your data into proportions first using something like dplyr and then you can easily plot the data.

library(dplyr)
plotting %>% 
  count(year, state_crossing) %>% 
  group_by(year) %>% 
  mutate(prop=n/sum(n)) %>% 
ggplot() +
  aes(x=year, y=prop, color=state_crossing, group=state_crossing) + 
  geom_line()
MrFlick
  • 195,160
  • 17
  • 277
  • 295