Dataframe "id" has the columns year, id, and matriline, where each row is an incident. I wanted to count the number of incidents by matriline per year, so I did:
events.bymatr =
id %>%
group_by(year, matr, .drop = FALSE) %>%
dplyr::summarise(n = n()) %>%
ungroup()
events.bymatr
I plotted a line graph of the number of incidents over time, by matriline.
ggplot(events.bymatr, aes(x=year, y=n, group=matr)) + geom_line(aes(color=matr))
My question is twofold:
Is there a way I could recreate this line graph where the thickness of the lines is determined by how many IDs there were, per matriline? I imagine this would involve reshaping my data above but when I tried to group_by(year,matr,id,.drop=FALSE) my data came out all wonky.
I want to change the color palete so that each color is very distinct - how do I attach a new color palette? I tried using this c25 palette with this code but it makes all my lines disappear. ggplot(events.bymatr, aes(x=year, y=n, group=matr)) + geom_line(aes(color=c25))
Thanks so much in advance!
Output of "id" (shortened to just the first five rows per column):
> dput(id)
structure(list(date = structure(c(8243, 8243, 8243, 8248, 8947,
class = "Date"), year = c(1992L, 1992L, 1992L, 1992L, 1994L),
event.id = c(8L, 8L, 8L, 10L, 11L), id = structure(c(51L, 55L, 59L,
46L, 51L), .Label = c("J11", "J16", "J17", "J2", "J22"),
class = "factor"), sex = structure(c(1L, 2L, 2L, 1L, 1L),
.Label = c("0", "1"), class = "factor"), age = c(28L, 12L, 6L, 42L,
30L), matr = structure(c(20L, 20L, 20L, 11L, 20L), .Label = c("J2",
"J4", "J7", "J9", "K11"), class = "factor"),
matralive = structure(c(2L, 2L, 2L, 2L, 2L),
.Label = c("0", "1"), class = "factor"), pod = structure(c(3L, 3L,
3L, 3L, 3L), .Label = c("J", "K", "L"), class = "factor")),
row.names = c(NA, -134L), class = c("tbl_df", "tbl", "data.frame"))
Output of events.bymatr:
> dput(events.bymatr)
structure(list(year = c(1992L, 1992L, 1992L, 1992L, 1992L),
matr = structure(c(1L, 2L, 3L, 4L, 5L), .Label = c("J2", "J4",
"J7", "J9", "K11"), class = "factor"), n = c(0L, 0L, 0L, 0L, 0L)),
row.names = c(NA, -380L), class = c("tbl_df", "tbl",
"data.frame"))