-2

I'm incredibly new to R and I'm having to use it for my dissertation. Anyway, I have a dataset of samples ranging from 2001 to 2021, but not all years have data associated with them i.e., only years 2001-2010, and 2017-2021 do, however the geom_line() function includes the years 2011-2016, despite the fact that my data frame doesn't even include those years; presumably it has intuitively included them. Is there a way to remove 2011-2016 from my x axis, as my data looks messy?

Here is my graphing code if you need it:

plot <- ggplot(df, aes(x=Year)) + 
  geom_line(aes(y=var1), color = "black", size=1) + 
  geom_line(aes(y=var2), color="red", size=1) + 
  geom_line(aes(y=var3), color="green", size=1) + 
  geom_line(aes(y=var4), color="yellow", size=1) + 
  geom_line(aes(y=var5), color="blue", size=1) + 
  geom_line(aes(y=var6), color="pink", size=1)

and my data.frame "Year" column is as follows: 2021, 2020, 2019, 2018, 2017, 2011, 2010, 2009, 2007, 2006, 2005, 2004, 2003, 2002, 2001 in decending order (I wasn't sure how to make a mock up, as I can barely use this as it is, so I apologise if this isn't enough)

Thank you!

camille
  • 16,432
  • 18
  • 38
  • 60
  • 3
    I'd recommend first looking through some ggplot tutorials to see how it's intended to be used—e.g. reshaped with some variable assigned to set the color and likely only one `geom_line` call. The official documentation is very thorough and links to a lot of tutorials. Beyond that, [see here](https://stackoverflow.com/q/5963269/5325862) on making a reproducible example, since you're referring to how your chart looks, but we can't see it or run your code – camille Jul 27 '21 at 19:47

1 Answers1

0

Here's an example of how you might do this. Fundamentally, you'll need to adjust your x axis to de-couple those values from their positional mapping.

# Sample data
df <- data.frame(
  Year = rep(c(2015:2017, 2020), 2),
  var1 = 2:5,
  var2 = 5:2
)

# Equivalent of your plot
ggplot(df, aes(x=Year)) + 
  geom_line(aes(y=var1), color = "black", size=1) + 
  geom_line(aes(y=var2), color="red", size=1)

enter image description here

One approach would be to change Year from a numeric value to a factor or character value. Those are discrete data types which will allow you to put the categories in evenly spaced order (in alphabetical for character, or in any arbitrary order for factors).

df$Year2 = as.factor(df$Year)
ggplot(df, aes(x = Year2, group = 1)) + 
  geom_line(aes(y=var1), color = "black", size=1) + 
  geom_line(aes(y=var2), color="red", size=1)
  

enter image description here

Another approach would be to renumber your axis to make an evenly spaced variable, e.g. "# of year in data", and then to change the labeling of the axis to reflect the underlying year.

df$Yearnum = rep(1:4, 2)
ggplot(df, aes(x = Yearnum, group = 1)) + 
  geom_line(aes(y=var1), color = "black", size=1) + 
  geom_line(aes(y=var2), color="red", size=1) +
  scale_x_continuous(labels = unique(df$Year))

enter image description here

As @camille noted in the comments, the best practice in ggplot2 for multiple series is to reshape the data into long format. This results in much simpler code, where you map the series itself to color instead of calling separate geom_line for each series. One potential downside is that the aesthetic specification requires a different syntax that might feel less direct.

df2 <- tidyr::pivot_longer(df, var1:var2)
ggplot(df2, aes(x = Year2, color = name, group = name)) + 
  geom_line(aes(y=value), size = 1) +
  scale_color_manual(values = c("var1" = "black", 
                                "var2" = "red"))
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • Seems like you're suggesting an anti-pattern. Why recommend making two `geom_line`s with colors hard-coded instead of reshaping and assigning color as an aesthetic? We don't know what the OP's data looks like, but this is going to have them still making unnecessary `geom_line` calls – camille Jul 27 '21 at 20:35
  • Agree with the suggestion to reshape and use a single call. The question was about the x axis so I chose to address that part here. Happy to add that best practice to the answer. – Jon Spring Jul 27 '21 at 20:36