0

I made a dataframe with columns year,month,temp,upper and lower

upper and lower are the max temperature by year and lower is the minimum

I have two questions:

first is why for some values in the end of dataframe the upper and lower are not correctly computed but in the rest of the dataframe they are fine?

And why am I getting weird axes when I am using ggplot the dataframe is this

as you can see upper and lower for 2017 is wrong

     Year   Month  Temp  upper lower        
1    1880   Jan    -.29  -.29   -.09 
2    1880   Feb    -.18  -.29   -.09
3    1880   Mar    -.11  -.29   -.09
       ......
1655 2017   Nov     .84   .96   1.12
1656 2017   Dec     .88   .96   1.12

the code is:

 newDF <- df %>%
 group_by(Year) %>%
 mutate(upper = max(Temp), # identify max value for month day
       lower = min(Temp) # identify min value for month day
       ) %>% 
 ungroup()

    p <- ggplot(newDF, aes(Month, Temp)) +
    geom_linerange(newDF, mapping=aes(x=Year, ymin=lower, ymax=upper), colour = "wheat2", alpha=.1)
    print(p)

the graph seems fine but the axis are messed up enter image description here

  • 1
    Please share sample of your data using `dput()` (not `str` or `head` or picture/screenshot) so others can help. See more here https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1 – Tung Sep 29 '18 at 21:54
  • Your code for me gave the correct minimum and maximum temperature by year for the data you posted. The result of running your posted code on your posted data was `structure(list(Year = c(1880L, 1880L, 1880L, 2017L, 2017L), Month = structure(c(3L, 2L, 4L, 5L, 1L), .Label = c("Dec", "Feb", "Jan", "Mar", "Nov" ), class = "factor"), Temp = c(-0.29, -0.18, -0.11, 0.84, 0.88 ), upper = c(-0.11, -0.11, -0.11, 0.88, 0.88), lower = c(-0.29, -0.29, -0.29, 0.84, 0.84)), class = c("tbl_df", "tbl", "data.frame" ), row.names = c(NA, -5L))` – duckmayr Sep 29 '18 at 23:20

1 Answers1

0

I think you're very close -- it's just the second part that needs a tweak. ggplot can work with a date field as the x axis, but the Month field is text (and it doesn't include the Year). Here I make a new column called date that combines them. lubridate is a handy package for that, since it does some smart parsing of date formats.

# Fake data
library(dplyr)
df <- data_frame(
  Year = rep(1880:2017, each = 12),
  Month = rep(month.abb, times = (2017-1880+1)),
  Temp = rnorm(n = 1656, mean = 0, sd = 1)
)


newDF = df %>%
  # This line adds a date field based on Year and Month
  mutate(date = lubridate::ymd(paste(Year, Month, 1))) %>%
  group_by(Year) %>%
  mutate(upper = max(Temp), # identify max value for month day
         lower = min(Temp), # identify min value for month day
          ) %>% 
  ungroup()

library(ggplot2)
p <- ggplot(newDF, aes(date, Temp)) +
  geom_linerange(newDF, mapping=aes(x=Year, ymin=lower, ymax=upper), colour = "wheat2", alpha=.1)
print(p)
Jon Spring
  • 55,165
  • 4
  • 35
  • 53