I currently have some data that is basically a factor and a date. Here is a simplified idea of it.
date <- c(1901,1901,1901,1902,1902,1902,1901,1903,1902,1904,1902,1903,1903,1904,1905, 1901,1903,1902,1904,1902,1902,1903,1904,1902,1902,1901,1903,1903,1904,1905, 1905,1906,1907,1908,1901,1908,1907,1905,1906,1902,1903,1903,1903,1904,1905,1901,1901,1901,1902,1902,1902,1901,1903,1902,1904,1902,1903,1903,1904,1905,
1901,1903,1902,1904,1902,1902,1903,1904,1902,1902,1901,1903,1903,1904,1905,
1905,1906,1907,1908,1901,1908,1907,1905,1906,1902,1903,1903,1903,1904,1905,
1905,1906,1907,1908,1901,1908,1907,1920,1920,1920,1921,1921,1921,1921,1921)
genre <- sample(c("fiction","nonfiction"),105,replace=TRUE)
data <- as.data.frame(cbind(date,genre))
# I know this is not an ideal way to coerce to a numeric
data$date <- as.numeric(as.character(data$date))
So far, so good. As you'll note if you plot it it, though, there is a big gap in the data which the line obscures. This plot will illustrate.
library(ggplot2)
ggplot(data,aes(x=date,color=genre)) + geom_line(stat='count')
I have seen this post which suggests adding a group, which I can do.
data$group <- ifelse(data$date < 1910,1,2)
ggplot(data,aes(x=date,color=genre,group=group)) + geom_line(stat='count')
So there appears to be no way to preserve the color aesthetics I want for my output and specify a group
, while using stat='count'
. This plot, for instance, nicely shows the gap in the data, but loses the color/distinction based on the genre
factor:
ggplot(data,aes(x=date,color=genre,group=group)) + geom_line(stat='count')
So, is this not possible? Am I missing something? Is there a better way to do this, or do I need to summarize
or otherwise mutate my date so that I don't rely on stat='count'
at the plotting stage?