7

I have quite a huge historical meteo station csv dataset (daily wind speed data from a set of weather stations for a region) and I would need to compute the average number of days per month in which wind speed is higher than 6 m/s for each meteo station. The stations does not contain data for the same number of years. An example of the dataset is shown below.

head(windspeed_PR)

  STN    Year Month Day WDSP WDSP.ms
1 860110 1974     6  19  9.3   4.784
2 860110 1974     7  13 19.0   9.774
3 860110 1974     7  22  9.9   5.093
4 860110 1974     8  20  9.5   4.887
5 860110 1974     9  10  3.3   1.698
6 860110 1974    10  10  6.6   3.395

Therefore, I basically would need to count how many WDPS.ms values are higher than 6 for each Month of the Year and each station (STN), and then calculate the average number of days per month per meteo station

Could I please have suggestions on how to compute this value (preferibly in R)?

Xavier de Lamo
  • 99
  • 1
  • 1
  • 4
  • 3
    Please provide a minimal reproducible example. The example doesn't have to be your real data, but you need to provide a reproducible example. Please see [how to make an R reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Jota May 20 '15 at 22:34

1 Answers1

18

This is fairly straightforward.

Using dplyr:

library(dplyr)
windspeed_PR %>%
    group_by(STN, Year, Month) %>%
    summarize(n_days = n(),
              n_gt6 = sum(WDSP.ms > 6),
              p_gt6 = n_gt6 / n_days)

This will return, for each station, year, month, the number of measurements, the number of measurements greater than 6, and their quotient (the proportion of measurements greater than 6).

It's not clear to me from you question if you want this further summarized (say, collapsing years), but it should form a good starting place for any additional work.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • 1
    Yes, above is better. When I first read this question it was very open ended and ambiguous. It appears to have been edited about 6 times in the last 20 minutes. – iSkore May 20 '15 at 22:58
  • Yes, looking at the edit history it's had some rapid improvement and clarity. – Gregor Thomas May 20 '15 at 23:02
  • Indeed haha. Thank you for your response too, haven't heard of dplyr. Definitely use that in the future. – iSkore May 20 '15 at 23:05
  • 1
    Thanks for your help Gregor and iSkore! Indeed, I'm new to Stackoverflow and I'm still learning about how to post questions in the most clear and specific way. – Xavier de Lamo May 21 '15 at 00:30