0

I have written a code to calculate RMSE error between observed and simulated data. But I want to do this only for the month of January only. The text file has data with date in first column, simulated data in 2nd column and observed data in 3rd column.

The format of data is as below:

DATE    cout    rout    coub    cinf
UNITS   m3/s    m3/s    m3/s    m3/s
1981-01-01  292.234 305 0   292.234
1981-01-02  293.152 320 0   293.152
1981-01-03  293.985 324 0   293.985
1981-01-04  295.115 308 0   295.115
1981-01-05  296.579 326 0   296.579
1981-01-06  298.266 344 0   298.266
1981-01-07  300.084 342 0   300.084
1981-01-08  301.945 329 0   301.945
1981-01-09  303.747 357 0   303.747
1981-01-10  305.437 351 0   305.437
1981-01-11  306.967 352 0   306.967
1981-01-12  308.281 382 0   308.28

The code below is written to calculate RMSE for entire dataset irrespective of dates:

# Function that returns Root Mean Squared Error

# set the working directory
setwd("D:\\Results\\")

# Get the header 1st line of the data
header <-scan("4001968.txt", nlines=1, what =character())

#Define number of lines to skip, which is 2
y <- read.table("4001968.txt",skip=2,header=F,sep="\t")

# Add the character vector header on as the names component
names(y) <- header

#Function for calculating RMSE
rmse <- function(error)
{
  sqrt(mean(error^2))
}

# Convert characater to numeric
y$cout <- as.numeric(as.character(y$cout)) 
y$rout <- as.numeric(as.character(y$rout)) 
actual <- y$cout
predicted <- y$rout

# Calculate error
error <- actual - predicted

# Invocation of functions
rmse(error)

The output will be a single value for the month of January only.

user6985
  • 93
  • 5
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Where exactly are you getting stuck? Are you getting an error message or something? – MrFlick Jan 21 '19 at 21:54
  • I have edited the question. I don't have any errors. The current code gives single RMSE value for the entire dataset but I want the RMSE error to be calculated if the month is January. – user6985 Jan 21 '19 at 22:09

1 Answers1

0

I find very useful the packages data.table and lubridate for dealing with this kind of problems:

# libraries
library(data.table)
library(lubridate)

# Function that returns Root Mean Squared Error

# set the working directory
setwd("D:\\Results\\")

# Get the header 1st line of the data
header <-scan("4001968.txt", nlines=1, what =character())

#Define number of lines to skip, which is 2
y <- read.table("4001968.txt",skip=2,header=F,sep="\t")

# Add the character vector header on as the names component
names(y) <- header

#Function for calculating RMSE
rmse <- function(error)
{
  sqrt(mean(error^2))
}

# Convert characater to numeric
y$cout <- as.numeric(as.character(y$cout)) 
y$rout <- as.numeric(as.character(y$rout)) 
y <- as.data.table(y)

# Calculate error
error <- y[month(DATE)==1, cout-rout]

# Invocation of functions
rmse(error)
LocoGris
  • 4,432
  • 3
  • 15
  • 30