I have a dataset (precipitation) with four columns. I want to summarize (in table format) the amount of rain that occurred on a monthly basis for each month in 2019 and 2020 (sum and difference between two years). I am struggling how to summarize heaps of daily data to give me a monthly summary AND filtering it out for quality that is "Good".
Columns in Dataset:
colnames(rain_file)
"ID" "deviceID" "remarks" "date" "amount_rain" "quality"
Date (The date column is formatted as follows and there are multiple readings for each date)
head(rain_file$date)
2018-01-01 2018-01-01 2018-01-01 2018-01-01 2018-01-01 2018-01-01 1096 Levels: 2018-01-01 2018-01-02 2018-01-03 2018-01-04 2018-01-05 ... 2020-12-31
Quality (5 types of Quality, I only want to filter for "Good")
head(rain_file$quality)
Good Good Good Good Good Good... Levels: Absent Good Lost Poor Snow Trace
I have this so far but it's not correct and I'm not sure what to do next...
data=read.table("rain_file.csv", header=TRUE, sep=",", fill=T, quote="\"")
dates=apply(data,1, function(x) {strsplit(x["date"],"-")})
data=cbind(data, t(as.data.frame(dates, row.names=c("year", "month", "day"))))
m_rain_df=tapply(data$amount_rain, data[,c("year","month")], mean, na.rm=T)
data=data.table(data)
m_rain_dt=data[, list(month_rain=mean(amount_rain, na.rm=T)), by=list(year, month)]