1

A continuation from this SO question

I have a very large df and want to sum column value for each lat/long that is within the radius.

set.seed(1)
radius<-10000 # In meters
lat<-runif(10,-90,90)
long<-runif(10,-180,180)
value<- runif(10,200,7000)
id<-1:10
dat<-cbind(id,lat,long, value)

Is there a RAM efficient way of doing this?

The original post suggested the following to count occurrences within the radius, I'm wondering whether I can sum a column in a similar way?

library(geosphere)
cbind(dat, X=rowSums(distm (dat[,3:2],
      fun = distHaversine) / 1000 <= 10000)) # number of points within distance 10000 km
Community
  • 1
  • 1
Davis
  • 466
  • 4
  • 20

2 Answers2

0

Naive way:

m <- distm(dat[, 3:2], fun = distHaversine) <= 1000*radius
X <- rowSums(m)
Y <- colSums(value * m)

cbind(dat, X, Y)
#       id       lat         long     value X         Y
#  [1,]  1 -42.20844 -105.8491530 6555.9956 5 18843.936
#  [2,]  2 -23.01770 -116.4395691 1642.5691 5 19627.074
#  [3,]  3  13.11361   67.3282248 4631.3816 5 10818.887
#  [4,]  4  73.47740  -41.7226614 1053.7747 6 17715.922
#  [5,]  5 -53.69725   97.1429112 2017.1005 4 15718.851
#  [6,]  6  71.71014   -0.8282728 2825.5758 6 17715.922
#  [7,]  7  80.04155   78.3426630  291.0543 6 17715.922
#  [8,]  8  28.94360  177.0861941 2800.2381 5  8613.212
#  [9,]  9  23.24053  -43.1873354 6113.8978 6 18482.867
# [10,] 10 -78.87847   99.8802797 2514.3732 4 12730.038

But if your data is really large it will not work. In this case you'll have to avoid computing all the distances. This article could be a good read.

Scarabee
  • 5,437
  • 5
  • 29
  • 55
  • thanks, the `distm` is too large, so your suggestion didn't work (`cannot allocate vector of size 49142.8 Gb`). I have approximately 2.5 million rows. I'll have a look at the article but let me know if you have any other suggestions. – Davis Apr 20 '17 at 08:22
  • Are some of your actual points close to the poles or to the 180th meridian? If it is not the case it gets a little simpler to implement the method of the article. – Scarabee Apr 20 '17 at 23:36
  • no, all locations are within the UK, I'm trying to count the number of postcodes (location) by lon/lat within a certain radius. – Davis Apr 24 '17 at 08:29
0

I add below a solution using the spatialrisk package. The key functions in this package are written in C++ (Rcpp), and are therefore very fast.

First, load the data:

set.seed(1)
radius<-10000 # In meters
lat<-runif(10,-90,90)
long<-runif(10,-180,180)
value<- runif(10,200,7000)
id<-1:10
dat<-data.frame(id,lat,long, value)

Then:

spatialrisk::concentration(sub = dat, full = dat, 
                           value = value, lon_sub = long, 
                           lon_full = long, radius = 10000)

    id       lat         long     value concentration
 1   1 -42.20844 -105.8491530 6555.9956     6555.9956
 2   2 -23.01770 -116.4395691 1642.5691     1642.5691
 3   3  13.11361   67.3282248 4631.3816     4631.3816
 4   4  73.47740  -41.7226614 1053.7747     1053.7747
 5   5 -53.69725   97.1429112 2017.1005     2017.1005
 6   6  71.71014   -0.8282728 2825.5758     2825.5758
 7   7  80.04155   78.3426630  291.0543      291.0543
 8   8  28.94360  177.0861941 2800.2381     2800.2381
 9   9  23.24053  -43.1873354 6113.8978     6113.8978
 10 10 -78.87847   99.8802797 2514.3732     2514.3732
mharinga
  • 1,708
  • 10
  • 23