3

I currently have some data that is basically a factor and a date. Here is a simplified idea of it.

date <- c(1901,1901,1901,1902,1902,1902,1901,1903,1902,1904,1902,1903,1903,1904,1905,       1901,1903,1902,1904,1902,1902,1903,1904,1902,1902,1901,1903,1903,1904,1905, 1905,1906,1907,1908,1901,1908,1907,1905,1906,1902,1903,1903,1903,1904,1905,1901,1901,1901,1902,1902,1902,1901,1903,1902,1904,1902,1903,1903,1904,1905,
1901,1903,1902,1904,1902,1902,1903,1904,1902,1902,1901,1903,1903,1904,1905,
1905,1906,1907,1908,1901,1908,1907,1905,1906,1902,1903,1903,1903,1904,1905,
1905,1906,1907,1908,1901,1908,1907,1920,1920,1920,1921,1921,1921,1921,1921)

genre <- sample(c("fiction","nonfiction"),105,replace=TRUE)
data <- as.data.frame(cbind(date,genre))
# I know this is not an ideal way to coerce to a numeric 
data$date <- as.numeric(as.character(data$date))

So far, so good. As you'll note if you plot it it, though, there is a big gap in the data which the line obscures. This plot will illustrate.

library(ggplot2)
ggplot(data,aes(x=date,color=genre)) + geom_line(stat='count')

Example Plot 1.

I have seen this post which suggests adding a group, which I can do.

data$group <- ifelse(data$date < 1910,1,2)
ggplot(data,aes(x=date,color=genre,group=group)) + geom_line(stat='count')

Example Plot 2

So there appears to be no way to preserve the color aesthetics I want for my output and specify a group, while using stat='count'. This plot, for instance, nicely shows the gap in the data, but loses the color/distinction based on the genre factor:

ggplot(data,aes(x=date,color=genre,group=group)) + geom_line(stat='count')

So, is this not possible? Am I missing something? Is there a better way to do this, or do I need to summarize or otherwise mutate my date so that I don't rely on stat='count' at the plotting stage?

cforster
  • 577
  • 2
  • 7
  • 19
  • Don't you want to switch to `geom_bar`, data like this would look great with it. – pogibas Sep 18 '17 at 18:57
  • @PoGibas Fair point; but this is just a minimal sample; in the real case there is a greater range of `dates`, *and* more `genres`, so that a barplot would look crowded and hard to read, but a a line plot (with, say, 6 lines, over 40 years) would still (I hope) be legible. – cforster Sep 18 '17 at 19:00

1 Answers1

4

You can combine "genre" and "group" to use as your group variable. Here I do this via the interaction function.

ggplot(data,aes(x = date, color = genre, group = interaction(genre, group))) + 
     geom_line(stat = 'count')

enter image description here

aosmith
  • 34,856
  • 9
  • 84
  • 118
  • Awesome! Works like a charm. Many thanks! I was not looking forward to having to juggle a bunch of summarizations. – cforster Sep 18 '17 at 19:45