Why is linear regression taking very long time to run in R?

Question

I'm running linear regression on a tiff image. Image sizes are;

ncol=6350, nrow=2077, nlayers=26

What I did before running the calculation is just read tiff image in R using

ndvi2000<-raster("img2000.tif")

Then wrote following script in R console window. Calculation process is taking very long time more than 20mins and still running. Is it normal to take long time on big image? The script of the regression is:

time<-sort(sample(97:297, nlayers(ndvi2000)))
t.lm.pred<-function(x) {if (is.na(x[1])) {NA} else{predict(lm(x~time))}}
f.pred<-calc(ndvi2000,t.lm.pred)

How can you run a linear regression on a tiff image? can you provide a reproducible exmple? Never heard of such a thing... — David Arenburg, May 01 '14 at 07:28
@DavidArenburg it is a GeoTIFF image, which results in a matrix of values once it is read using `raster`. But the OP needs to make the example reproducible. — Paul Hiemstra, May 01 '14 at 07:42
I wrote procedures above. Now I'm wondering if i did something wrong within these steps. — user1769107, May 01 '14 at 07:49
By "reproducible" we don't mean "procedures". See here http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — David Arenburg, May 01 '14 at 07:50
If `img2000.tif` is a multiband GeoTiff, and you want to treat the bands as separate layers, you might need to read it in as a `stack`. `ndvi2000 <- raster("img2000.tif")`. Otherwise, I think you'll find that `nlayers(ndvi2000)` is `1`. Or does `calc` treat the multiple bands at each cell as a vector of values? — jbaums, May 01 '14 at 08:46
@jbaums, you are right. I have corrected image. Now the image has 26 bands. — user1769107, May 01 '14 at 10:50
What do you hope to get by running a linear fit to a random sample of your data? If you can explain what you're looking for, we might be able to suggest a much easier/faster way to get there. (Aggregating your data so as to downsize the array size comes to mind) — Carl Witthoft, May 01 '14 at 11:30
Now I'm doing phenological study. My final purpose is to find the dates when grown season start and end in a specific year. Actually here I'm gonna do fitting by polynomial regression. Linear regression is just training to work in R. I want to fit NDVI as a function of Julian date to represent the seasonal changes in NDVI as a function of Julian day. A time series NDVI image has 26 bands (8-day images) during April to October in Julian date (97-297). My expected output is images in the same time interval through the year. — user1769107, May 01 '14 at 11:52
I want to emphasize spatial distribution of phenological events rather than graphical information extracted from a single pixel. That is why I'm trying to use whole images without excluding spatial and temporal information of NDVI images. — user1769107, May 01 '14 at 12:07

score 3 · Answer 1 · answered May 01 '14 at 07:56

The number of values you have is very large, so I'm not in the least surprised that it takes very long. Simply making a list of random numbers the size of your tiff file:

x = runif(6350 * 2077 * 26)
object.size(x) / (1024 * 1024)
2616.216

That is over 2.5 Gb, and that is just to save one variable. A rule of thumb is that you need roughly three times the amount of RAM than your dataset size. So, assuming you load some more images, you'll needs more than 10-20 Gb of RAM. If you don't have enough RAM, your operating system will starting swapping memory to disk, which makes your analysis veeeery slow.

I think it will be good idea to rethink your analysis, either that or rent a 64 Gb RAM EC2 instance. You could only look at the temporal average, or spatial average. Only look at specific locations, etc, etc. Simply brute-force using all values in your data might not be best here.

How can I integrate land cover image with this equation? I'd like to calculate it for each land cover type separately. — user1769107, May 02 '14 at 04:46
That is a very different question, I suggest you ask a new question, including a reproducible example (code+example data). — Paul Hiemstra, May 02 '14 at 07:42

Why is linear regression taking very long time to run in R?

1 Answers1