0

I have daily rainfall data in Excel (which I can save as a CSV or txt file) that I would like to manipulate and load into R. I'm very new to R.

The format of the data is such that I have I have the following columns

Year; Month; Rain on day 1 of Month, Rain on Day 2, ... , Rain on day 31;

This means that I have a large array/table. Some data is missing because it wasn't recorded, and some because February 31st, June 31st, etc do not exist.

I would like to analyse things like monthly totals, and their distributions.

What is the best way to input data so it can be easily manipulated, and that I can distinguish between missing data and NULL data (31st Feb)?

Thanks a lot in advance

user1778351
  • 143
  • 7
  • Side note: It's always a good idea to provide example data and code that you tried, with which people can play - this will give you more answers and you will likely receive them faster. Otherwise every community member would have to build an example for herself/himself. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – lukeA Mar 10 '16 at 13:13

1 Answers1

0

Several things for you to have a look at. E.g. readxl::read_excel() for reading excel files or Hmisc::monthDays(dates) for determining the number of days for each month in a dates vector.

Anyway, here's one idea as a starter:

# create sample data
set.seed(1)
mat <- matrix(rbinom(5*31, 31, .5), nrow=5)
mat[sample(1:length(mat), 10)] <- NA
df <- data.frame(year=2016, month=1:5, mat)

#  reshape data from wide to long format
library(reshape2)
dflong <- melt(df, id.vars = 1:2, variable.name = "day")

# add date column (will be NA if conversion is not possible, i.e. if date does not exists)
dflong$date <- as.Date(with(dflong, paste(year, month, day, sep="-")), format = "%Y-%m-X%e")

# Select only existing dates
dflong <- subset(dflong[order(dflong$month), ], !is.na(date))

# Aggregate: means per month and year (missing values removed)
aggregate(value~year+month, dflong, mean, na.rm=TRUE)
#   year month    value
# 1 2016     1 15.93548
# 2 2016     2 15.26923
# 3 2016     3 15.10345
# 4 2016     4 15.74074
# 5 2016     5 16.16667
lukeA
  • 53,097
  • 5
  • 97
  • 100
  • Thank you, and I'm sorry for my slow reply. I will look into your possible solution and see if it gets me going. Cheers – user1778351 Mar 10 '16 at 18:05