-1

I have a dataset that has individual records with ZIP Codes and Installation Dates. So far, I was able to plot records in a ZIP Code by:

  • Subset a ZIP Code
  • Sort the records by Date
  • Create a new column and assign increasing values (by 1) for the next row.
  • Plot this last field by Date

The result looks like this:

Now, what I want to do is have multiple ZIP Code geom_lines in the same figure. Each ZIP Code area has a different first record date, and I would like all of them to start at the same point on the X-axis.

Here's a failed attempt. I want these lines to start at the same point on the X-axis:

I am looking for ideas on how to proceed.

Thanks!

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
Direnk
  • 1
  • 1
  • Welcome to stack overflow. In order to make your question reproducible and thus answerable, we need minimal, self-contained code and data so that we are able to reproduce your problem on our machine, please follow these guidelines: [mre], https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Peter Jul 09 '20 at 15:14

1 Answers1

1

Let's try to roughly emulate your data structure since your question does not include any data:

library(ggplot2)

set.seed(69)

df <- data.frame(
  ZIP   = c(rep("A", 1000), rep("B", 687)),
  Count = c(cumsum(round(runif(1000, 0, 0:999))), 
            cumsum(round(runif(687, 0, 0:686) * 4))),
  Date = c(seq(as.POSIXct("2007-09-01"), by = "1 week", length.out = 1000),
           seq(as.POSIXct("2013-08-31"), by = "1 week", length.out = 687)))
   
ggplot(df, aes(Date, Count, colour = ZIP)) + 
  geom_line() +
  scale_colour_manual(values = c("blue", "red"))

enter image description here

Now clearly, if we want these lines to start at the same position on the x axis, the x axis can no longer reflect the absolute date, but rather the time since the first record. So we need to calculate what this would be for each group. The dplyr package can help us here:

library(dplyr)

df %>% 
  group_by(ZIP) %>% 
  mutate(Day = as.numeric(difftime(Date, min(Date), units = "days"))) %>%
  ggplot(aes(Day, Count, colour = ZIP)) + 
    geom_line() +
    labs(x = "Day since first record") +
    scale_colour_manual(values = c("blue", "red"))

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87