-3

I want to plot the rows of the following matrix in ggplot as a line plot.

Specifically:

1) I want 25th Pct Cohort 1, 50th Pct Cohort 1 and 75th pct Cohort 1 colored black

2) I want 25th Pct Cohort 2, 50th Pct Cohort 2, and 75th pct Cohort 2 colored steelblue

3) I want 25th Pct Cohort 3, 50th Pct Cohort 3, and 75th pct Cohort 3 colored grey

4) I want all 50th pct lines a shade darker or slightly larger in size (so they stand out).

5) I want a legend that labels each line according to the rownames

6) I want all 25th pct linetypes to be dotted

7) I want all 50th pct linetypes to be solid

8) I want all 75th pct linetypes to be long-dash

Sorry for all the requirements. I'm new to this and learning.

start = as.Date("1993-12-01") 
end = as.Date("2018-09-01")
dates = seq(from = start, to = end, by = "quarter")

test <- matrix(nrow=9, ncol =100, rnorm(900,0,1))
colnames(test) = as.character(dates)
rownames(test) = c("25th Pct Cohort 1", "50th Pct Cohort 1", "75th Pct Cohort 1", "25th Pct Cohort 2", "50th Pct Cohort 2" , "75th Pct Cohort 2", "25th Pct Cohort 3", "50th Pct Cohort 3", "75th Pct Cohort 3")

The dataset doesn't need to be reproducible. Its just teaching me the process.

I understand the first step is to convert from wide to long format. I do this as follows:

library(reshape)
df <- melt(as.matrix(test))

df <- melt(as.matrix(test)) 
colnames(df) <- c("Cohort", "Date", "value") 
df$Date <- as.Date(df$Date) 
ggplot(df, aes(x=Date, y= value)) + geom_line(aes(colour = Cohort)) + theme_classic() + 
scale_colour_manual("",
                    values = c("25th Pct Cohort 1" = "black", "50th Pct Cohort 1" = "black", "75th Pct Cohort 1" = "black", "25th Pct Cohort 2" = "steel blue", "50th Pct Cohort 2"= "steelblue" , "75th Pct Cohort 2" = "steelblue", "25th Pct Cohort 3" = "grey", "50th Pct Cohort 3" = "grey", "75th Pct Cohort 3" = "grey"),
                    breaks = c("50th Pct Cohort 1", "75th Pct Cohort 1", "25th Pct Cohort 2", "50th Pct Cohort 2" , "75th Pct Cohort 2", "25th Pct Cohort 3", "50th Pct Cohort 3", "75th Pct Cohort 3")) +
scale_linetype_manual("",
                        values = c("dotted", "solid", "longdash", "dotted", "solid", "longdash", "dotted", "solid", "longdash"),
                        breaks = c("50th Pct Cohort 1", "75th Pct Cohort 1", "25th Pct Cohort 2", "50th Pct Cohort 2" , "75th Pct Cohort 2", "25th Pct Cohort 3", "50th Pct Cohort 3", "75th Pct Cohort 3"))

But i am lost after this.

JC3019
  • 363
  • 1
  • 9
  • 1
    ggplot is designed to work with data frames in a long format. You are starting with a matrix in a wide format, so your first step will be doing that conversion. I'd suggest looking at the FAQ on [converting wide to long](https://stackoverflow.com/q/2185252/903061). Give it a go, show an attempt, and let us know were you get stuck. Otherwise it just seems like you want us to write a custom tutorial for you. – Gregor Thomas Sep 28 '19 at 03:25
  • @Gregor i've done this. Sorry I had no idea where to go after though. – JC3019 Sep 28 '19 at 03:40
  • Here's a start, now to formatting: `library(reshape); library(ggplot2) df <- melt(as.matrix(test)) colnames(df) <- c("Cohort", "Date", "value") df$Date <- as.Date(df$Date) ggplot(df, aes(x=Date, y= value)) + geom_line(aes(colour = Cohort))` – Jon Spring Sep 28 '19 at 05:30
  • @JonSpring Thanks i'm now closer. Could you have a look at my ammended attempt. I'm still not sure how to emphasise some lines nor convert them to different linetypes. – JC3019 Sep 28 '19 at 06:03
  • Everything in ggplot is about columns. You want your percentiles to be indicated by linetype. So, you need a `percentile` column of class `factor`, with values like `"25th percentile", "50th percentile", ...` or similar. Then inside your `aes()` you add `linetype = percentile` to indicate that the `percentile` column defines the line type. – Gregor Thomas Sep 28 '19 at 06:53

1 Answers1

1

Here's some code, based off of your start. I won't post the plot because it looks like garbage since the data is just noise.

# minor adjustments from above
df <- reshape::melt(as.matrix(test))
colnames(df) <- c("Pct.Cohort", "Date", "value") 
df$Date <- as.Date(df$Date) 

# Get each graphical dimension (x, y, color, size, linetype) in its own column
# already have x and y. Size and linetype are mapped to the same data. But the
# values are currently in the same column as color. Need to separate.

df$Pct.Cohort = as.character(df$Pct.Cohort)

# get the percentiles out as everything before " Cohort"
df$Pct = sub(" Cohort.*", "", df$Pct.Cohort)

# get the cohort number out as last character
df$Cohort = as.integer(substr(df$Pct.Cohort, nchar(df$Pct.Cohort), nchar(df$Pct.Cohort)))

# plot
ggplot(df, aes(x=Date, y= value)) +
  geom_line(aes(colour = factor(Cohort), linetype = Pct, size = Pct)) +
  theme_classic() + 
  scale_colour_manual("Cohort", values = c("black", "steelblue", "grey")) +
  scale_linetype_manual("Percentile", values = c("dotted", "solid", "longdash")) +
  scale_size_manual(values = c(0.8, 1.4, 0.8), guide = "none")                       

You might consider a + facet_wrap(~ Cohort, ncol = 1) to really tell them apart, depending on how your real data is.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294