1

I have a data set of Standardized Precipitation Index values from 1980 to 2005. There is one value for each month, so altogether there are 312 (26 years * 12 months) values. The SPI values range between -3 and +3. Here is an easy reproducible example, since the exact values are not important for my question:

vec1 <- rep(seq(1980, 2005), each= 12)
vec2 <- sample(x = -3:3, size = 312, replace = TRUE)
df <- data.frame(vec1, vec2)
colnames(df) <- c("Year", "SPI")

Now I would like to plot the SPI values with the years being the x-axis.

When I try to plot it using ggplot2:

ggplot() +
  geom_line(aes(x=df$Year, y=df$SPI))

Something like this comes out: enter image description here

So the problem is, there is no continuous line.

I can plot it with a continuous line with Base R for example:

plot(vec2, type="l")

But then the problem is that the x-axis only shows the values 1:312 and I need the years as the x-values.

Anybody with a hint?

EDIT after the answer of marcguery:

It turned out that I cannot use a line plot for my purpose. Instead, I need to do a column plot with many single columns when using ggplot2 since I need to color the areas above/below zero.

marcguery's answer works for a geom_line() plot, but unfortunately not for a geom_col() plot. I have no idea why.

Here is the modified code:

vec1 <- seq(as.Date("1980-01-01"), 
            by = "month", 
            to = as.Date("2005-12-01"))
vec2 <- sample(x = -3:3, size = 312, replace = TRUE)
vec3 <- 1:312
df <- data.frame(vec1, vec2, vec3)
colnames(df) <- c("Date", "SPI", "ID")


library(data.table)
df <- as.data.table(df)

This is what unfortunately does not work with the dates as x-axis, there is a strange output:

library(ggplot2)
# with Date as x-axis
ggplot(data= df, aes(x= Date, y= SPI, width= 1)) +
  geom_col(data = df[SPI <= 0], fill = "red") +
  geom_col(data = df[SPI >= 0], fill = "blue") +
  theme_bw()

enter image description here

This is what works with the simple rownumber as x-axis:

# with ID as x-axis 
ggplot(data= df, aes(x= ID, y= SPI, width= 1)) +
  geom_col(data = df[SPI <= 0], fill = "red") +
  geom_col(data = df[SPI >= 0], fill = "blue") +
  theme_bw()

enter image description here

I need something like the last example, just with the dates as the x-axis.

climsaver
  • 341
  • 2
  • 15
  • What is the plot you posted showing? Is it one observation for each year? It doesn't look like they're plotting all 12 for each year to me, but I might be wrong – Matt Kaye Mar 25 '21 at 18:26
  • @Matt Kaye: Yes, normally there is a value for every single month of each year. – climsaver Mar 25 '21 at 23:06

1 Answers1

2

Your observations per month of each year have all the same value in your column Year, hence why ggplot cannot assign them different x values. Since you are working with dates, you could use Date format for your time points so that each month is assigned a different value.

#Seed for reproducibility
set.seed(123)
#Data
vec1 <- seq(as.Date("1980-01-01"), 
            by = "month", 
            to = as.Date("2005-12-01"))
vec2 <- sample(x = -3:3, size = 312, replace = TRUE)
df <- data.frame(vec1, vec2)
colnames(df) <- c("Date", "SPI")

#Plot
library(ggplot2)
ggplot(df) +
  geom_line(aes(x = Date, y = SPI))+
  scale_x_date(breaks = "5 years", date_labels = "%Y",
               limits = c(as.Date("1979-12-01"),
                          as.Date("2006-01-01")),
               expand = c(0,0))

enter image description here

Edit after you added your question about coloring the area between your values and 0 based on the sign of the values:

You can definitely use a geom_line plot for that purpose. Using a geom_col plot is a possibility but you would loose visual information about change between your x variables which are continuously related as they represent dates.

To plot a nice geom_line, I will base my approach on the answer here https://stackoverflow.com/a/18009173/14027775. You will have to adapt your data by transforming your dates to numerical values, for instance number of days since a given date (typically 1970/01/01).

#Colored plot
#Numerical format for dates (number of days after 1970-01-01)
df$numericDate <- difftime(df$Date,
                           as.Date("1970-01-01", "%Y-%m-%d"),
                           units="days")
df$numericDate <- as.numeric(df$Date)

rx <- do.call("rbind",
              sapply(1:(nrow(df)-1), function(i){
                f <- lm(numericDate~SPI, df[i:(i+1),])
                if (f$qr$rank < 2) return(NULL)
                r <- predict(f, newdata=data.frame(SPI=0))
                if(df[i,]$numericDate < r & r < df[i+1,]$numericDate)
                  return(data.frame(numericDate=r,SPI=0))
                else return(NULL)
              }))
#Get back to Date format
rx$Date <- as.Date(rx$numericDate, origin = "1970-01-01")

d2 <- rbind(df,rx)

ggplot(d2,aes(Date,SPI)) + 
  geom_area(data=subset(d2, SPI<=0), fill="red") +
  geom_area(data=subset(d2, SPI>=0), fill="blue") + 
  geom_line()+
  scale_x_date(breaks = "5 years", date_labels = "%Y",
               limits = c(as.Date("1979-12-01"),
                          as.Date("2006-01-01")),
               expand = c(0,0))

enter image description here

Now if you want to keep using geom_col, the reason why you don't see all the bars using dates for the x axis is that they are too thin to be filled as they represent one single day over a long period of time. By filling and coloring them, you should be able to see all of them.

ggplot(data= df, aes(x= Date, y= SPI)) +
  geom_col(data = df[df$SPI <= 0,], 
           fill = "red", color="red", width= 1) +
  geom_col(data = df[df$SPI >= 0,], 
           fill = "blue", color="blue", width= 1) +
  scale_x_date(breaks = "5 years", date_labels = "%Y",
               limits = c(as.Date("1979-12-01"),
                          as.Date("2006-01-01")),
               expand = c(0,0))

enter image description here

marcguery
  • 441
  • 1
  • 4
  • 12
  • @ marcguery: thanks a lot for the answer, it is already very helpful. But I found myself facing another problem now. Your answer works great for a geom_line plot. I need to do a geom_col plot and then it starts getting tricky, I really have no idea why. I will make an EDIT above so you can see what I mean. Hope you have a hint for that, too. – climsaver Mar 29 '21 at 09:41
  • @climdrag kindly ask one question at a time. (please ask a new question). Please don't forget to reference this first question when asking the new question. And please kindly consider accepting this answer here if it has resolved your (first) question – tjebo Mar 29 '21 at 12:50
  • @ marcguery: thank you so much for your help, now it is exactly what I wanted. @ tjebo: sorry for the confusion – climsaver Mar 29 '21 at 14:22