0

I have some data gathered from a weather buoy:

station longitude   latitude    time    wd  wspd    gst wvht    dpd apd mwd bar
42001   -89.658 25.888  1975-08-13T22:00:00Z    23  4.1 NaN NaN NaN NaN NaN 1017.4
42001   -89.658 25.888  1975-08-13T23:00:00Z    59  3.1 NaN NaN NaN NaN NaN 1017.3
42001   -89.658 25.888  1975-08-14T00:00:00Z    30  5.2 NaN NaN NaN NaN NaN 1017.4
42001   -89.658 25.888  1975-08-14T01:00:00Z    70  2   NaN NaN NaN NaN NaN 1017.8
42001   -89.658 25.888  1975-08-14T02:00:00Z    87  5.7 NaN NaN NaN NaN NaN 1018.2
42001   -89.658 25.888  1975-08-14T03:00:00Z    105 5.6 NaN NaN NaN NaN NaN 1018.6
42001   -89.658 25.888  1975-08-14T04:00:00Z    116 5.8 NaN NaN NaN NaN NaN 1018.7
42001   -89.658 25.888  1975-08-14T05:00:00Z    116 5   NaN NaN NaN NaN NaN 1018.5
42001   -89.658 25.888  1975-08-14T06:00:00Z    123 4.5 NaN NaN NaN NaN NaN 1018.1
42001   -89.658 25.888  1975-08-14T07:00:00Z    137 4.1 NaN NaN NaN NaN NaN 1017.9
42001   -89.658 25.888  1975-08-14T08:00:00Z    151 3.6 NaN NaN NaN NaN NaN 1017.7
42001   -89.658 25.888  1975-08-14T09:00:00Z    153 3.5 NaN NaN NaN NaN NaN 1017.6
42001   -89.658 25.888  1975-08-14T10:00:00Z    180 3.5 NaN NaN NaN NaN NaN 1017.7
42001   -89.658 25.888  1975-08-14T11:00:00Z    189 2.8 NaN NaN NaN NaN NaN 1018
42001   -89.658 25.888  1975-08-14T12:00:00Z    183 1.7 NaN NaN NaN NaN NaN 1018.3
42001   -89.658 25.888  1975-08-14T13:00:00Z    172 0.7 NaN NaN NaN NaN NaN 1018.8
42001   -89.658 25.888  2001-11-18T11:00:00Z    38  7.3 8.8 1.1 6.67    4.51    69  1021
42001   -89.658 25.888  2001-11-18T12:00:00Z    29  7.9 9.3 1.01    5.88    4.42    57  1021.4
42001   -89.658 25.888  2001-11-18T13:00:00Z    29  7.4 8.3 1.02    7.14    4.42    65  1022.1
42001   -89.658 25.888  2001-11-18T14:00:00Z    23  8   9.5 0.97    5.56    4.48    55  1022.6
42001   -89.658 25.888  2001-11-18T15:00:00Z    16  7.6 8.9 1   6.67    4.5 64  1023.2
42001   -89.658 25.888  2001-11-18T16:00:00Z    26  8.9 10.2    0.94    4.17    4.49    29  1023.1
42001   -89.658 25.888  2001-11-18T17:00:00Z    26  8.5 10.2    0.98    4.55    4.48    36  1022.7
42001   -89.658 25.888  2001-11-18T18:00:00Z    17  7.8 9.1 1.07    4.76    4.56    30  1021.9
42001   -89.658 25.888  2001-11-18T19:00:00Z    24  8.1 9.1 1.07    4.55    4.6 29  1021
42001   -89.658 25.888  2001-11-18T20:00:00Z    18  8.3 11.1    1.21    6.25    4.6 69  1020
42001   -89.658 25.888  2001-11-18T21:00:00Z    30  8   9.4 1.2 6.67    4.72    77  1019.8
42001   -89.658 25.888  2001-11-18T22:00:00Z    39  8.2 9.6 1.32    6.67    4.8 76  1019.8
42001   -89.658 25.888  2001-11-18T23:00:00Z    32  8.5 9.6 1.21    6.67    4.63    71  1019.7
42001   -89.658 25.888  2001-11-19T00:00:00Z    38  8.9 10.3    1.28    6.25    4.6 72  1019.8
42001   -89.658 25.888  2001-11-19T01:00:00Z    48  8.3 9.6 1.26    6.67    4.53    71  1020.2
42001   -89.658 25.888  2001-11-19T02:00:00Z    54  10.1    11.6    1.28    6.67    4.59    65  1021.1
42001   -89.658 25.888  2001-11-19T03:00:00Z    60  3   4.7 1.29    5.88    4.58    72  1021.5
42001   -89.658 25.888  2001-11-19T04:00:00Z    77  0.8 1.7 1.25    6.67    4.92    63  1021.2
42001   -89.658 25.888  2001-11-19T05:00:00Z    153 2.1 3   1.21    6.67    4.91    64  1021
42001   -89.658 25.888  2001-11-19T06:00:00Z    20  2.2 5.5 1.18    6.25    4.92    65  1020.6
42001   -89.658 25.888  2001-11-19T07:00:00Z    158 6.2 9.7 1.31    6.67    5.22    67  1020.3
42001   -89.658 25.888  2001-11-19T08:00:00Z    162 7.4 9   1.26    6.67    5.42    73  1020.1
42001   -89.658 25.888  2001-11-19T09:00:00Z    218 4.8 6.2 1.2 7.69    4.98    65  1019.9

How could I create a data frame from aggregating the data (using the mean) on a monthly basis while leaving out the NaN values? The start of the data has numerous rows with NaN, but for several years there are values in those rows.

I've tried:

DF2 <- transform(buoy1, time = substring(time, 1, 7))
aggregate(as.numeric(wd) ~ time, DF2[-1,], mean, na.rm=TRUE))

Which generates

401 2010-09      109.20556
402 2010-10      107.42473
403 2010-11      130.67222
404 2010-12      135.75000
405 2011-01      156.11306
406 2011-02      123.33931
407 2011-03      137.29744
408 2011-04      119.85139
409 2011-05      148.65276
410 2011-06      104.74722
411 2011-07       88.16393
412 2011-09      106.60229
413 2011-10       93.32527
414 2011-11      149.52712
415 2011-12      123.09005
416 2012-01      145.38731
417 2012-02      115.40288
418 2012-03      127.44415
419 2012-04      133.02503
420 2012-05      122.34683
421 2012-06      146.95265
422 2012-07      133.58199
423 2012-08      149.08356

Is there a more efficient way to aggregate across all the columns at once?

Something like

DF2[,5:20] <- sapply(DF2[,5:20], as.numeric, na.rm=TRUE)
monthAvg <- aggregate(DF2[, 5:20], cut(time, "month"),mean)

But then I get:

Error in cut.default(time, "month") : 'x' must be numeric

UseR10085
  • 7,120
  • 3
  • 24
  • 54
Jim T
  • 65
  • 5

2 Answers2

1

Here is a base R solution

d <- within(buoy1[-1:-3], time <- format(as.POSIXct(time), "%Y-%m"))
aggregate(. ~ time, d, mean, na.rm = TRUE, na.action = NULL)
# "." means anything other than the RHS, which is `time` in this case

Output

     time        wd     wspd      gst     wvht      dpd      apd      mwd      bar
1 1975-08 118.37500 3.806250      NaN      NaN      NaN      NaN      NaN 1018.000
2 2001-11  58.04348 6.882609 8.452174 1.157391 6.186957 4.690435 61.04348 1021.043
ekoam
  • 8,744
  • 1
  • 9
  • 22
  • Strange...I'm getting a `Error in as.POSIXlt.character(x, tz, ...) : character string is not in a standard unambiguous format` error even though I've formatted the time column to be `"%Y-%m"` – Jim T Dec 01 '20 at 16:51
  • Exported the data to a text file then re-imported into Excel and now it's working...thank you for this! – Jim T Dec 01 '20 at 17:08
0

You could create a new column with year and month information and take mean of multiple columns using across.

library(dplyr)

df %>%
  group_by(time = format(as.POSIXct(time), '%Y-%m')) %>%
  summarise(across(gst:bar, mean, na.rm = TRUE)) -> result

result
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Getting the error `Error in UseMethod("group_by_") : no applicable method for 'group_by_' applied to an object of class "function"` when I try this – Jim T Dec 01 '20 at 16:39
  • @JimT Please change `df` to your dataframe name i.e `buoy1`. – Ronak Shah Dec 01 '20 at 23:07