2

I have a dataframe containing daily observations of various climate measurements for 31 stations, which are factors. Each station has many years' worth of daily observations and effectively, each station has a unique number of years recorded, and unique number of observations.

For example data, I have subset it down to a 13 stations with one observation per unique water_year.

    NAME                        DATE         PRCP calendar_year month   day water_year water_date
    <fct>                       <date>      <dbl> <fct>         <int> <int> <fct>      <date>    
102 FLORENCE 0.2 SSE, OR US     2007-12-05  0     2007             12     5 2007       2006-12-05
103 FLORENCE 0.2 SSE, OR US     2008-10-01  0     2008             10     1 2008       2007-10-01
104 FLORENCE 0.2 SSE, OR US     2009-12-16  0.9   2009             12    16 2009       2008-12-16
105 FLORENCE 0.2 SSE, OR US     2010-10-19  0     2010             10    19 2010       2009-10-19
106 FLORENCE 0.2 SSE, OR US     2012-07-10  0     2012              7    10 2012       2012-07-10
107 FLORENCE 0.5 NE, OR US      2007-12-12  0     2007             12    12 2007       2006-12-12
108 FLORENCE 0.5 NE, OR US      2008-01-01  0     2008              1     1 2008       2008-01-01
109 FLORENCE 0.6 E, OR US       2008-01-01  0     2008              1     1 2008       2008-01-01
110 FLORENCE 0.9 NW, OR US      2007-12-22  0.09  2007             12    22 2007       2006-12-22
111 FLORENCE 0.9 NW, OR US      2008-10-01  0     2008             10     1 2008       2007-10-01
112 FLORENCE 0.9 NW, OR US      2009-10-01  0.02  2009             10     1 2009       2008-10-01
113 FLORENCE 0.9 NW, OR US      2010-10-01  0.03  2010             10     1 2010       2009-10-01
114 FLORENCE 0.9 NW, OR US      2011-10-01  0.02  2011             10     1 2011       2010-10-01
115 FLORENCE 0.9 NW, OR US      2012-10-01  0     2012             10     1 2012       2011-10-01
116 FLORENCE 0.9 NW, OR US      2013-10-01  0.92  2013             10     1 2013       2012-10-01
117 FLORENCE 0.9 NW, OR US      2014-10-01  0.01  2014             10     1 2014       2013-10-01
118 FLORENCE 0.9 NW, OR US      2015-10-01  0     2015             10     1 2015       2014-10-01
119 FLORENCE 0.9 NW, OR US      2016-10-01  0.15  2016             10     1 2016       2015-10-01
120 FLORENCE 0.9 NW, OR US      2017-10-01  0.2   2017             10     1 2017       2016-10-01
121 FLORENCE 0.9 NW, OR US      2018-01-01  0     2018              1     1 2018       2018-01-01
122 FLORENCE 1.8 NW, OR US      2007-12-14  0     2007             12    14 2007       2006-12-14
123 FLORENCE 1.8 NW, OR US      2008-10-01  0     2008             10     1 2008       2007-10-01
124 FLORENCE 1.8 NW, OR US      2009-10-25  0     2009             10    25 2009       2008-10-25
125 FLORENCE 1.8 NW, OR US      2010-10-05  0.01  2010             10     5 2010       2009-10-05
126 FLORENCE 1.8 NW, OR US      2011-10-01  0.02  2011             10     1 2011       2010-10-01
127 FLORENCE 1.8 NW, OR US      2012-10-02  0     2012             10     2 2012       2011-10-02
128 FLORENCE 1.8 NW, OR US      2013-10-01  0.570 2013             10     1 2013       2012-10-01
129 FLORENCE 1.8 NW, OR US      2014-10-01  0.02  2014             10     1 2014       2013-10-01
130 FLORENCE 1.8 NW, OR US      2015-10-01  0.02  2015             10     1 2015       2014-10-01
131 FLORENCE 1.8 NW, OR US      2016-10-01  0.08  2016             10     1 2016       2015-10-01
132 FLORENCE 1.8 NW, OR US      2017-10-01  0.23  2017             10     1 2017       2016-10-01
133 FLORENCE 1.8 NW, OR US      2018-01-01  0.01  2018              1     1 2018       2018-01-01
134 FLORENCE 2.1 NNW, OR US     2007-12-17  0.96  2007             12    17 2007       2006-12-17
135 FLORENCE 2.1 NNW, OR US     2008-10-01  0     2008             10     1 2008       2007-10-01
136 FLORENCE 2.1 NNW, OR US     2009-10-01  0     2009             10     1 2009       2008-10-01
137 FLORENCE 2.1 NNW, OR US     2010-10-01  0.03  2010             10     1 2010       2009-10-01
138 FLORENCE 2.1 NNW, OR US     2011-10-01  0     2011             10     1 2011       2010-10-01
139 FLORENCE 2.1 NNW, OR US     2012-10-01  0     2012             10     1 2012       2011-10-01
140 FLORENCE 2.1 NNW, OR US     2013-12-26  0     2013             12    26 2013       2012-12-26
141 FLORENCE 2.1 NNW, OR US     2014-10-07  0     2014             10     7 2014       2013-10-07
142 FLORENCE 2.1 NNW, OR US     2016-05-21  0     2016              5    21 2016       2016-05-21
143 FLORENCE 2.1 NNW, OR US     2017-12-26  0     2017             12    26 2017       2016-12-26
144 FLORENCE 2.9 NNW, OR US     2007-12-16  0.07  2007             12    16 2007       2006-12-16
145 FLORENCE 2.9 NNW, OR US     2008-10-01  0     2008             10     1 2008       2007-10-01
146 FLORENCE 2.9 NNW, OR US     2009-10-01  0.03  2009             10     1 2009       2008-10-01
147 FLORENCE 2.9 NNW, OR US     2010-10-02  0.05  2010             10     2 2010       2009-10-02
148 FLORENCE 2.9 NNW, OR US     2011-10-01  0.02  2011             10     1 2011       2010-10-01
149 FLORENCE 2.9 NNW, OR US     2012-10-02  0     2012             10     2 2012       2011-10-02
150 FLORENCE 2.9 NNW, OR US     2013-10-01  0.580 2013             10     1 2013       2012-10-01
151 FLORENCE 2.9 NNW, OR US     2014-10-01  0.02  2014             10     1 2014       2013-10-01
152 FLORENCE 2.9 NNW, OR US     2015-10-01  0     2015             10     1 2015       2014-10-01
153 FLORENCE 2.9 NNW, OR US     2016-10-04  0.580 2016             10     4 2016       2015-10-04
154 FLORENCE 2.9 NNW, OR US     2017-10-01  0.2   2017             10     1 2017       2016-10-01
155 FLORENCE 2.9 NNW, OR US     2018-01-01  0     2018              1     1 2018       2018-01-01
156 FLORENCE 5.4 N, OR US       2007-12-22  0.03  2007             12    22 2007       2006-12-22
157 FLORENCE 5.4 N, OR US       2008-10-01  0     2008             10     1 2008       2007-10-01
158 FLORENCE 5.4 N, OR US       2009-10-01  0.07  2009             10     1 2009       2008-10-01
159 FLORENCE 5.4 N, OR US       2010-10-01  0.03  2010             10     1 2010       2009-10-01
160 FLORENCE 5.4 N, OR US       2011-10-03  0.65  2011             10     3 2011       2010-10-03
161 FLORENCE 5.4 N, OR US       2012-10-01  0     2012             10     1 2012       2011-10-01
162 FLORENCE 5.4 N, OR US       2013-10-01  0.6   2013             10     1 2013       2012-10-01
163 FLORENCE 5.4 N, OR US       2014-10-01  0     2014             10     1 2014       2013-10-01
164 FLORENCE 5.4 N, OR US       2015-10-01  0     2015             10     1 2015       2014-10-01
165 FLORENCE 5.4 N, OR US       2016-11-01  0.21  2016             11     1 2016       2015-11-01
166 FLORENCE 5.4 N, OR US       2017-11-11  0.9   2017             11    11 2017       2016-11-11
167 FLORENCE 5.4 N, OR US       2018-01-01  0     2018              1     1 2018       2018-01-01
168 FLORENCE 5.4 S, OR US       2007-12-08  0.42  2007             12     8 2007       2006-12-08
169 FLORENCE 5.4 S, OR US       2008-10-01  0     2008             10     1 2008       2007-10-01
170 FLORENCE 5.4 S, OR US       2009-10-01  0     2009             10     1 2009       2008-10-01
171 FLORENCE 5.4 S, OR US       2010-10-01  0.03  2010             10     1 2010       2009-10-01
172 FLORENCE 5.4 S, OR US       2011-10-01  0     2011             10     1 2011       2010-10-01
173 FLORENCE 5.4 S, OR US       2012-10-01  0     2012             10     1 2012       2011-10-01
174 FLORENCE 5.4 S, OR US       2013-10-01  0.6   2013             10     1 2013       2012-10-01
175 FLORENCE 5.4 S, OR US       2014-10-02  0     2014             10     2 2014       2013-10-02
176 FLORENCE 5.4 S, OR US       2015-01-01  0     2015              1     1 2015       2015-01-01
177 FLORENCE 5.8 S, OR US       2007-12-01  0.02  2007             12     1 2007       2006-12-01
178 FLORENCE 5.8 S, OR US       2008-10-01  0     2008             10     1 2008       2007-10-01
179 FLORENCE 5.8 S, OR US       2009-10-01  0.02  2009             10     1 2009       2008-10-01
180 FLORENCE 5.8 S, OR US       2010-10-01  0.01  2010             10     1 2010       2009-10-01
181 FLORENCE 5.8 S, OR US       2011-10-01  0     2011             10     1 2011       2010-10-01
182 FLORENCE 5.8 S, OR US       2012-10-01  0     2012             10     1 2012       2011-10-01
183 FLORENCE 5.8 S, OR US       2013-10-01  0.75  2013             10     1 2013       2012-10-01
184 FLORENCE 5.8 S, OR US       2014-01-01  0     2014              1     1 2014       2014-01-01
185 FLORENCE 5.9 NNE, OR US     2007-11-29  0.41  2007             11    29 2007       2006-11-29
186 FLORENCE 5.9 NNE, OR US     2008-10-03  0.39  2008             10     3 2008       2007-10-03
187 FLORENCE 5.9 NNE, OR US     2009-10-01  0.01  2009             10     1 2009       2008-10-01
188 FLORENCE 5.9 NNE, OR US     2010-10-01  0.05  2010             10     1 2010       2009-10-01
189 FLORENCE 5.9 NNE, OR US     2011-10-01  0.02  2011             10     1 2011       2010-10-01
190 FLORENCE 5.9 NNE, OR US     2012-10-01  0     2012             10     1 2012       2011-10-01
191 FLORENCE 5.9 NNE, OR US     2013-10-01  0.43  2013             10     1 2013       2012-10-01
192 FLORENCE 5.9 NNE, OR US     2014-10-01  0     2014             10     1 2014       2013-10-01
193 FLORENCE 5.9 NNE, OR US     2015-10-10  0.69  2015             10    10 2015       2014-10-10
194 FLORENCE 5.9 NNE, OR US     2016-10-01  0.11  2016             10     1 2016       2015-10-01
195 FLORENCE 5.9 NNE, OR US     2017-01-01  0.24  2017              1     1 2017       2017-01-01
196 FLORENCE 6 N, OR US         2007-11-19  0.04  2007             11    19 2007       2006-11-19
197 FLORENCE 6 N, OR US         2008-10-01  0     2008             10     1 2008       2007-10-01
198 FLORENCE 6 N, OR US         2009-10-01  0     2009             10     1 2009       2008-10-01
199 FLORENCE 6 N, OR US         2010-01-01  0.7   2010              1     1 2010       2010-01-01
200 FLORENCE NUMBER 2, OR US    2006-10-01  0     2006             10     1 2006       2005-10-01
201 FLORENCE NUMBER 2, OR US    2007-10-01  0     2007             10     1 2007       2006-10-01
202 FLORENCE NUMBER 2, OR US    2008-10-01  0     2008             10     1 2008       2007-10-01
203 FLORENCE NUMBER 2, OR US    2009-10-01  0     2009             10     1 2009       2008-10-01
204 FLORENCE NUMBER 2, OR US    2010-10-01  0.04  2010             10     1 2010       2009-10-01
205 FLORENCE NUMBER 2, OR US    2011-10-01  0.9   2011             10     1 2011       2010-10-01
206 FLORENCE NUMBER 2, OR US    2012-10-01  0     2012             10     1 2012       2011-10-01
207 FLORENCE NUMBER 2, OR US    2013-10-01  0.46  2013             10     1 2013       2012-10-01
208 FLORENCE NUMBER 2, OR US    2014-10-01  0     2014             10     1 2014       2013-10-01
209 FLORENCE NUMBER 2, OR US    2015-10-01  0     2015             10     1 2015       2014-10-01
210 FLORENCE NUMBER 2, OR US    2016-10-01  0.77  2016             10     1 2016       2015-10-01
211 FLORENCE NUMBER 2, OR US    2017-10-01  0.06  2017             10     1 2017       2016-10-01
212 FLORENCE NUMBER 2, OR US    2018-01-01  0     2018              1     1 2018       2018-01-01
213 FLORENCE, OR US             1909-10-01  0.580 1909             10     1 1909       1908-10-01
214 FLORENCE, OR US             1910-10-01  0.49  1910             10     1 1910       1909-10-01
215 FLORENCE, OR US             1911-10-01  0.03  1911             10     1 1911       1910-10-01
216 FLORENCE, OR US             1912-10-01  0.07  1912             10     1 1912       1911-10-01
217 FLORENCE, OR US             1913-10-01  0     1913             10     1 1913       1912-10-01
218 FLORENCE, OR US             1914-10-01  0.24  1914             10     1 1914       1913-10-01
219 FLORENCE, OR US             1915-10-01  0.25  1915             10     1 1915       1914-10-01
220 FLORENCE, OR US             1916-10-01  0.03  1916             10     1 1916       1915-10-01
221 FLORENCE, OR US             1917-10-01  0     1917             10     1 1917       1916-10-01
222 FLORENCE, OR US             1918-10-01  0     1918             10     1 1918       1917-10-01
223 FLORENCE, OR US             1919-10-01  0.6   1919             10     1 1919       1918-10-01
224 FLORENCE, OR US             1920-10-01  1.22  1920             10     1 1920       1919-10-01
225 FLORENCE, OR US             1921-10-01  0     1921             10     1 1921       1920-10-01
226 FLORENCE, OR US             1922-10-01  0.03  1922             10     1 1922       1921-10-01
227 FLORENCE, OR US             1949-12-08  0     1949             12     8 1949       1948-12-08
228 FLORENCE, OR US             1950-10-01  0     1950             10     1 1950       1949-10-01
229 FLORENCE, OR US             1951-01-01  0.32  1951              1     1 1951       1951-01-01
230 FLORENCE, OR US             2004-10-01  0     2004             10     1 2004       2003-10-01
231 FLORENCE, OR US             2005-10-01  0.88  2005             10     1 2005       2004-10-01
232 FLORENCE, OR US             2006-10-01  0     2006             10     1 2006       2005-10-01
233 FLORENCE, OR US             2007-10-01  0.33  2007             10     1 2007       2006-10-01
234 FLORENCE, OR US             2008-10-01  0     2008             10     1 2008       2007-10-01
235 FLORENCE, OR US             2009-10-01  0     2009             10     1 2009       2008-10-01
236 FLORENCE, OR US             2010-10-01  0.04  2010             10     1 2010       2009-10-01
237 FLORENCE, OR US             2011-10-01  0.75  2011             10     1 2011       2010-10-01
238 FLORENCE, OR US             2012-10-02  0     2012             10     2 2012       2011-10-02
239 FLORENCE, OR US             2013-10-01  0.63  2013             10     1 2013       2012-10-01
240 FLORENCE, OR US             2014-10-01  0     2014             10     1 2014       2013-10-01
241 FLORENCE, OR US             2015-10-01  0     2015             10     1 2015       2014-10-01
242 FLORENCE, OR US             2016-10-01  0.16  2016             10     1 2016       2015-10-01
243 FLORENCE, OR US             2017-01-01  0.53  2017              1     1 2017       2017-01-01

My goal is to:

  • per NAME
  • per each extant water_year
  • create a dot/smooth plot of all PRCP values against water_date
  • combine by NAME

So the resulting plots would be PRCP on the y axis, water_date on the x axis, and dots/smooths grouped by each water_year (available for that NAME) plotted over top each other. There would be 31 plots in total, one for each NAME.

A simple way to do this for a given NAME with PRCP plotted against water_date per single water_year would be:

ggplot(srb_clean %>% filter(NAME == "made up name" & water_year == "1902") ,aes(water_date, PRCP)) +
  geom_point(na.rm=TRUE)  +
  geom_smooth(colour = "red",size = 1)

This code would produce a plot for one year's worth of data whereas the desired out put would have a dot/smooth group for each year of data available for that NAME.

enter image description here

I am looking for a way to automate the process of creating each of these plots, and outputting one plot per NAME, with PRCP x water_date, grouped by water_year.

What is the most elegant, or the most standard way of doing something like this R? I am a programming novice, and somewhat befuddled about how to approach this programmatically, let alone in R in particular.


UPDATE #1 (improved example data and question)


UPDATE #3 (solution)

Parfait's solution works well. It can be used with code similar to that above to output a plots similar to the following:

multiple water years overplot

Clayton Glasser
  • 153
  • 1
  • 11
  • 1
    You should have a closer look at how ggplot work. Especially you want to consider the `aes group` and use it to group by year, and `facet_grid` to plot each station. This is basic ggplot, make a bit more research before posting. Also try to implement something before asking for a solution. Good luck ;) – Hobo Sheep Aug 25 '18 at 20:19
  • 1
    This will be a relatively straight-forward task in ggplot -- a reproducible example would help. The toughest part will be getting the x-axis to range from Jan 1 to Dec 31 -- you will have to strip the month + date out of water_date and create a new column with the dates within a dummy year, as done in https://stackoverflow.com/questions/33832776/synchronous-x-axis-for-multiple-years-of-sales-with-ggplot . After that, just feed the dataframe into ggplot, with the aesthetics x = the new dummy date, y = PRCP, grouping with the calendar_year, and facet_wrap by STATION (not facet_grid). – jhchou Aug 25 '18 at 20:58
  • 2
    Please share sample of your data using `dput()` (not `str` or `head` or picture/screenshot) so others can help. See more here https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1 – Tung Aug 25 '18 at 21:33
  • 1
    Can you also post or draw your desired output plot? – Tung Aug 25 '18 at 21:35
  • @HoboSheep You may be surprised, I have a lot of time invested in trying implementations on this :) Post updated with better info. Group by water_year and facet wrap by NAME has two shortcomings: 1) it results in a grid of plots, rather than individual plots, and 2) it does not overplot each water_year onto the same water_date (single-annual) graph. I am still unclear on how to accomplish these. – Clayton Glasser Aug 26 '18 at 21:19
  • @jhchou thank you for your advice. Please see my above comment to Hobo Sheep. Post is updated with clearer explanation of desired output. Using Aes group and facet_wrap does not achieve the crucial step on overplotting each water_year (multiple years) onto the same water_date (single-annual) graph, or the less crucial but desired result of outputting individual graphs for each NAME. – Clayton Glasser Aug 26 '18 at 21:23
  • @Tung Thank you for that link, very helpful. I have edited the post to be more clear, and I hope it is sufficient. – Clayton Glasser Aug 26 '18 at 21:25
  • 1
    @ClaytonGlasser: you need to use `dput()` to share data. The above table is not readily usable for helpers. The `NAME` column even contains many spaces – Tung Aug 26 '18 at 21:38
  • 1
    @ClaytonGlasser These edits are super useful as they provide both a better explanation of what you want to acheive and what is your problem. I think most of your problem are solved by the answer below by parfait. Sorry if my answer sounded very negative but we need to fully understand your problem and be sure you put some effort in this to be able to help you. :) – Hobo Sheep Aug 28 '18 at 06:53

1 Answers1

1

Since you require the same x-axis of dates in an annual period, consider updating all years in water_date to a common year that currently no rows maintain such as 2099 - 2100.

Then use by (the function to slice a dataframe to smaller subsets by one or more factors) to generate a list of plots for each distinct NAME. To ignore the 2099, use the scales library to plot the month and day: %b-%d (month name) %m-%d (month number). Also, pass water_year as fill factor for legend series.

library(ggplot2)
library(scales)
...

# TEMP HELPER VARIABLE
df$wt_date_char <- as.character(df$water_date)

# REPLACE EVERY YEAR FOR 2099 OR 2100
# CONDITIONALLY UPDATE YEAR BY MONTH NUMEBR
df$pseudo_water_date <- ifelse(substr(df$wt_date_char, 6, 7) %in% paste0("0", as.character(seq(1,9))),
                               gsub("^(.*?)\\-", "2099-", df$wt_date_char),
                               gsub("^(.*?)\\-", "2100-", df$wt_date_char)
                        ) 

df$pseudo_water_date <- as.Date(df$pseudo_water_date, origin="1970-01-01")
df$wt_date_char <- NULL

# BUILD PLOT LIST
plot_list <- by(srb_clean, srb_clean$NAME, function(sub)
               ggplot(sub, aes(pseudo_water_date, PRCP, fill=factor(water_year))) +
                  geom_point(na.rm=TRUE)  + 
                  geom_smooth(colour = "red", size = 1) +
                  ggtitle(sub$NAME[[1]]) + 
                  labs(title="Water Year", x="Water Date", y="Precipitation") +
                  theme(plot.title = element_text(hjust = 0.5)) +
                  scale_x_date(labels = date_format("%b-%d"))
             ) 

# OUTPUT INDIVIDUAL PLOTS
plot_list[[1]]
plot_list[[2]]
plot_list[[3]]
...

# OUTPUT ALL PLOTS
plot_list
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • This is excellent, thank you. This works except in one respect. By converting all the water_dates to year=2099 (enabling me to plot them all on the same annual chart), I lose the ability to make the graph run Oct-Nov (since a full water_year contains 9 months with last year's calendar year, thus "starting" in Oct). How would you suggest obliging the graph to run Oct-Nov (for example 10-01-2000 -- 09-01-2001) with this approach? – Clayton Glasser Aug 27 '18 at 04:44
  • 1
    See updated answer adding a conditional `ifelse` assignment where months 10, 11, 12 are updated to 2099 and all other months, 1-9 are updated to 2100, one year later. For clarity, I use a new variable for graphing purposes: *pseudo_water_date*. – Parfait Aug 27 '18 at 14:46
  • I have been trying to implement this code and have discovered that it has the effect of converting ALL of the years in pseudo_water_date to 2100, not just months 1-9. I have carefully reviewed every aspect of the ifelse/substr/gsub code; it all makes sense and I don't see any errors. Do you have any insight into why it might not work? Is the IF statement not being triggered? – Clayton Glasser Aug 29 '18 at 21:00
  • 1
    Whoops! Logic should be adjusted. See edit, changing `seq(10,12)` to `seq(1,9)`. – Parfait Aug 29 '18 at 21:07
  • Ah, yes, true. Combined with reversing the gsub logic (switching 2100 and 2099), this has the desired result. Thanks! @Parfait – Clayton Glasser Aug 29 '18 at 21:46