3

I have a time-series data for the last 20 years. The variable has been measured every year so I have 20 values. I have a tab-delimited file with first column representing year and second column the value. Here is what it looks like :

1991    438
1992    408
1993    381
1994    361
1995    338
1996    315
1997    289
1998    261
1999    229
2000    206
2001    190
2002    173
2003    151
2004    141
2005    126
2006    108
2007    99
2008    93
2009    85
2010    77
2011    71
2012    67

I want to extrapolate the value of second column for coming years. The rate at which values in second column is decreasing is also going down so I think we can't use linear regression. I wish to know in which year the second column will approach the value of zero. I have never used R so it would be great if you can even help me with code that will be used to read the data from a tab-delimited file.

Thanks

smci
  • 32,567
  • 20
  • 113
  • 146
user1985425
  • 87
  • 1
  • 2
  • 4
  • -1. There are thousands of resources how to read data into R. The other part of the question is better suited on stats.stackexchange.com. – EDi Mar 20 '13 at 22:34
  • Not to mention [an entire manual just for data input/output](http://cran.r-project.org/doc/manuals/R-data.html) – Dirk Eddelbuettel Mar 20 '13 at 22:47
  • 4
    Try this: `library(zoo); library(forecast); z <- read.zoo("file.dat"); f <- forecast(z); print(f); plot(f)` and read the 5 vignettes (PDF documents) here: http://cran.r-project.org/web/packages/zoo/index.html – G. Grothendieck Mar 20 '13 at 23:08
  • 1
    @EDi: no, basic questions on how to program extrapolation are very much on-topic here. I actually had to research it myself recently, and found it's a gap in SO's knowledgebase, so I posted my answer. Frankly, R is a confused mess on interpolation/extrapolation. – smci May 18 '15 at 02:31
  • 1
    But yeah, mixing this with a question on reading in the file was not conducive to getting a good response. – smci May 18 '15 at 02:33

2 Answers2

10

The following is a sketch that may help you get started.

## get the data
tmp <- read.table(text="1991    438
1992    408
1993    381
1994    361
1995    338
1996    315
1997    289
1998    261
1999    229
2000    206
2001    190
2002    173
2003    151
2004    141
2005    126
2006    108
2007    99
2008    93
2009    85
2010    77
2011    71
2012    67", col.names=c("Year", "value"))

library(ggplot2)

## develop a model
tmp$pred1 <- predict(lm(value ~ poly(Year, 2), data=tmp))

## look at the data
p1 <- ggplot(tmp, aes(x = Year, y=value)) +
  geom_line() +
  geom_point() +
  geom_hline(aes(yintercept=0))

print(p1)

## check the model
p1 +
  geom_line(aes(y = pred1), color="red")

## extrapolate based on model
pred <- data.frame(Year=1990:2050)
pred$value <- predict(lm(value ~ poly(Year, 2), data=tmp),newdata=pred)

p1 +
  geom_line(color="red", data=pred)

In this case our model says the line will never cross zero. If that makes no sense then you'll want to pick a different model. Whatever model you pick, graph the result along with the data so you can see how well you're doing.

Ista
  • 10,139
  • 2
  • 37
  • 38
  • I think in line 40 it should be geom_line(aes(y = tmp$pred1), color="red") instead of geom_line(aes(y = pred1), color="red") – Tungurahua Jan 14 '15 at 19:04
  • @Tungurahua either that or define `tmp$pred1` before creating `p1`, or make geom_line read the data again with `geom_line(aes(y = pred1), color="red", data = tmp)` – Ista Jan 15 '15 at 20:20
  • @Tungurahua I've edited the answer, defining `tmp$pred1` first. – Ista Jan 15 '15 at 20:28
6

To read in the data from formatted file:

require(utils)  # (make sure you have 'utils' package installed!)
data <- read.table('<filename>', header=FALSE, colnames=c('Year','Value'))

and see the read.table manpage

To extrapolate the data:

as EDi and Dirk said you need to do a little reading. Decide what sort of extrapolation fn you want: linear (Hmisc::approxExtrap for linear extrapolation; approxfun does interpolation but not extrapolation), spline(stats::splinefun or splines package), etc. splinefun is probably ok for your case. Specifically for forecasting time-series, see forecast (you should also browse related SO questions). After you skim those manpages, try something out, post some code and tell us where you're stuck, can respond more. Otherwise you'll get flamed mercilessly and your question will likely be closed as 'Give me teh codez' ;-)

Community
  • 1
  • 1
smci
  • 32,567
  • 20
  • 113
  • 146
  • thanks smci for pointing me out in the correct direction. let me work on it and come up with a code. thanks a lot. – user1985425 Mar 20 '13 at 23:01
  • 1
    Small correction: `approxfun` interpolates linearly, but does not extrapolate linearly - rather it either returns NA, or "the value at the closest data extreme". – jbaums Apr 11 '14 at 10:14
  • 1
    @jbaums true, but `Hmisc::approxExtrap` does – RockScience Oct 15 '14 at 08:10