5

Right now i have a large data set with temperature going up and down all the time. I want to smoothen my data and plot the best fit line with all the temperature,

Here is the data:

weather.data  
    date        mtemp   
1   2008-01-01  12.9        
2   2008-01-02  12.9        
3   2008-01-03  14.5        
4   2008-01-04  15.7            
5   2008-01-05  17.0        
6   2008-01-06  17.8    
7   2008-01-07  20.2        
8   2008-01-08  20.8        
9   2008-01-09  21.4        
10  2008-01-10  20.8        
11  2008-01-11  21.4        
12  2008-01-12  22.0        

and so on............... til 2009 Dec 31

My current graph looks like this and my data fit a regression like either the running average or loess:

enter image description here

However, when I tried to fit it with the running average, it became like this:

enter image description here

Here is my code.

plot(weather.data$date,weather.data$mtemp,ylim=c(0,30),type='l',col="orange")
par(new=TRUE)

Could anyone give me a hand?

Uwe
  • 41,420
  • 11
  • 90
  • 134
londwwq1
  • 157
  • 3
  • 3
  • 7

1 Answers1

15

Depending on your actual data and how you want to smooth it, and why you want to smooth it there are various options.

I am showing you examples with linear regression (first and second order) and local regression (LOESS). These may or may not be the good statistical models to use for your data, but it is difficult to tell without seeing it. In any case:

time <- 0:100
temp <- 20+ 0.01 * time^2 + 0.8 * time + rnorm(101, 0, 5)

# Generate first order linear model
lin.mod <- lm(temp~time)

# Generate second order linear model
lin.mod2 <- lm(temp~I(time^2)+time)

# Calculate local regression
ls <- loess(temp~time)

# Predict the data (passing only the model runs the prediction 
# on the data points used to generate the model itself)
pr.lm <- predict(lin.mod)
pr.lm2 <- predict(lin.mod2)
pr.loess <- predict(ls)

par(mfrow=c(2,2))
plot(time, temp, "l", las=1, xlab="Time", ylab="Temperature")
lines(pr.lm~time, col="blue", lwd=2)

plot(time, temp, "l", las=1, xlab="Time", ylab="Temperature")
lines(pr.lm2~time, col="green", lwd=2)

plot(time, temp, "l", las=1, xlab="Time", ylab="Temperature")
lines(pr.loess~time, col="red", lwd=2)

Another option would be to use a moving average.

For instance:

library(zoo)
mov.avg <- rollmean(temp, 5, fill=NA)
plot(time, temp, "l")
lines(time, mov.avg, col="orange", lwd=2)

examples of smoothing

nico
  • 50,859
  • 17
  • 87
  • 112
  • [2]: http://i.stack.imgur.com/KZWWx.png – londwwq1 Aug 13 '14 at 19:18
  • my current graph look like this and my data fit a regression like either the running average or loess... – londwwq1 Aug 13 '14 at 19:19
  • however, when i tried to fit it with the running average, it became like this..... http://i.stack.imgur.com/d9LMc.png – londwwq1 Aug 13 '14 at 19:20
  • it didt fit really well.. and when i tried with loess, there were some error messages. > ls <- loess(weather.data$mtemp~weather.data$date) Error in simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize, : NA/NaN/Inf in foreign function call (arg 2) In addition: Warning message: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize, : NAs introduced by coercion – londwwq1 Aug 13 '14 at 19:21
  • is it because i have actual date for time? – londwwq1 Aug 13 '14 at 19:39
  • Try to change the span parameter, it may be as simple as that (see `?loess` for details – nico Aug 13 '14 at 22:40
  • what is a span parameter? – londwwq1 Aug 14 '14 at 09:31
  • @londwwq1: the help page for loess (accessible using `?loess`) describes that in detail. Essentially it's the parameter used for smoothing. – nico Aug 16 '14 at 08:35