I have some data gathered from a weather buoy:
station longitude latitude time wd wspd gst wvht dpd apd mwd bar
42001 -89.658 25.888 1975-08-13T22:00:00Z 23 4.1 NaN NaN NaN NaN NaN 1017.4
42001 -89.658 25.888 1975-08-13T23:00:00Z 59 3.1 NaN NaN NaN NaN NaN 1017.3
42001 -89.658 25.888 1975-08-14T00:00:00Z 30 5.2 NaN NaN NaN NaN NaN 1017.4
42001 -89.658 25.888 1975-08-14T01:00:00Z 70 2 NaN NaN NaN NaN NaN 1017.8
42001 -89.658 25.888 1975-08-14T02:00:00Z 87 5.7 NaN NaN NaN NaN NaN 1018.2
42001 -89.658 25.888 1975-08-14T03:00:00Z 105 5.6 NaN NaN NaN NaN NaN 1018.6
42001 -89.658 25.888 1975-08-14T04:00:00Z 116 5.8 NaN NaN NaN NaN NaN 1018.7
42001 -89.658 25.888 1975-08-14T05:00:00Z 116 5 NaN NaN NaN NaN NaN 1018.5
42001 -89.658 25.888 1975-08-14T06:00:00Z 123 4.5 NaN NaN NaN NaN NaN 1018.1
42001 -89.658 25.888 1975-08-14T07:00:00Z 137 4.1 NaN NaN NaN NaN NaN 1017.9
42001 -89.658 25.888 1975-08-14T08:00:00Z 151 3.6 NaN NaN NaN NaN NaN 1017.7
42001 -89.658 25.888 1975-08-14T09:00:00Z 153 3.5 NaN NaN NaN NaN NaN 1017.6
42001 -89.658 25.888 1975-08-14T10:00:00Z 180 3.5 NaN NaN NaN NaN NaN 1017.7
42001 -89.658 25.888 1975-08-14T11:00:00Z 189 2.8 NaN NaN NaN NaN NaN 1018
42001 -89.658 25.888 1975-08-14T12:00:00Z 183 1.7 NaN NaN NaN NaN NaN 1018.3
42001 -89.658 25.888 1975-08-14T13:00:00Z 172 0.7 NaN NaN NaN NaN NaN 1018.8
42001 -89.658 25.888 2001-11-18T11:00:00Z 38 7.3 8.8 1.1 6.67 4.51 69 1021
42001 -89.658 25.888 2001-11-18T12:00:00Z 29 7.9 9.3 1.01 5.88 4.42 57 1021.4
42001 -89.658 25.888 2001-11-18T13:00:00Z 29 7.4 8.3 1.02 7.14 4.42 65 1022.1
42001 -89.658 25.888 2001-11-18T14:00:00Z 23 8 9.5 0.97 5.56 4.48 55 1022.6
42001 -89.658 25.888 2001-11-18T15:00:00Z 16 7.6 8.9 1 6.67 4.5 64 1023.2
42001 -89.658 25.888 2001-11-18T16:00:00Z 26 8.9 10.2 0.94 4.17 4.49 29 1023.1
42001 -89.658 25.888 2001-11-18T17:00:00Z 26 8.5 10.2 0.98 4.55 4.48 36 1022.7
42001 -89.658 25.888 2001-11-18T18:00:00Z 17 7.8 9.1 1.07 4.76 4.56 30 1021.9
42001 -89.658 25.888 2001-11-18T19:00:00Z 24 8.1 9.1 1.07 4.55 4.6 29 1021
42001 -89.658 25.888 2001-11-18T20:00:00Z 18 8.3 11.1 1.21 6.25 4.6 69 1020
42001 -89.658 25.888 2001-11-18T21:00:00Z 30 8 9.4 1.2 6.67 4.72 77 1019.8
42001 -89.658 25.888 2001-11-18T22:00:00Z 39 8.2 9.6 1.32 6.67 4.8 76 1019.8
42001 -89.658 25.888 2001-11-18T23:00:00Z 32 8.5 9.6 1.21 6.67 4.63 71 1019.7
42001 -89.658 25.888 2001-11-19T00:00:00Z 38 8.9 10.3 1.28 6.25 4.6 72 1019.8
42001 -89.658 25.888 2001-11-19T01:00:00Z 48 8.3 9.6 1.26 6.67 4.53 71 1020.2
42001 -89.658 25.888 2001-11-19T02:00:00Z 54 10.1 11.6 1.28 6.67 4.59 65 1021.1
42001 -89.658 25.888 2001-11-19T03:00:00Z 60 3 4.7 1.29 5.88 4.58 72 1021.5
42001 -89.658 25.888 2001-11-19T04:00:00Z 77 0.8 1.7 1.25 6.67 4.92 63 1021.2
42001 -89.658 25.888 2001-11-19T05:00:00Z 153 2.1 3 1.21 6.67 4.91 64 1021
42001 -89.658 25.888 2001-11-19T06:00:00Z 20 2.2 5.5 1.18 6.25 4.92 65 1020.6
42001 -89.658 25.888 2001-11-19T07:00:00Z 158 6.2 9.7 1.31 6.67 5.22 67 1020.3
42001 -89.658 25.888 2001-11-19T08:00:00Z 162 7.4 9 1.26 6.67 5.42 73 1020.1
42001 -89.658 25.888 2001-11-19T09:00:00Z 218 4.8 6.2 1.2 7.69 4.98 65 1019.9
How could I create a data frame from aggregating the data (using the mean) on a monthly basis while leaving out the NaN
values? The start of the data has numerous rows with NaN
, but for several years there are values in those rows.
I've tried:
DF2 <- transform(buoy1, time = substring(time, 1, 7))
aggregate(as.numeric(wd) ~ time, DF2[-1,], mean, na.rm=TRUE))
Which generates
401 2010-09 109.20556
402 2010-10 107.42473
403 2010-11 130.67222
404 2010-12 135.75000
405 2011-01 156.11306
406 2011-02 123.33931
407 2011-03 137.29744
408 2011-04 119.85139
409 2011-05 148.65276
410 2011-06 104.74722
411 2011-07 88.16393
412 2011-09 106.60229
413 2011-10 93.32527
414 2011-11 149.52712
415 2011-12 123.09005
416 2012-01 145.38731
417 2012-02 115.40288
418 2012-03 127.44415
419 2012-04 133.02503
420 2012-05 122.34683
421 2012-06 146.95265
422 2012-07 133.58199
423 2012-08 149.08356
Is there a more efficient way to aggregate across all the columns at once?
Something like
DF2[,5:20] <- sapply(DF2[,5:20], as.numeric, na.rm=TRUE)
monthAvg <- aggregate(DF2[, 5:20], cut(time, "month"),mean)
But then I get:
Error in cut.default(time, "month") : 'x' must be numeric