1

I am very new to raster data and the use of R for spatial data analysis, so apologies if there's an obvious solution or process for this I've missed.

I have a raster file of population data from WorldPop, and a set of latitude / longitude location points that overlay onto that. I am trying to determine what portion of the population is (according to the WorldPop estimates) within a given distance of these points of interest, and also what portion is not.

I understand that using raster::extract, I should be able to get the sum of population values from (for example) a 1-kilometer buffer around each of these points. (Although my points and raster data are both in lat/lon projection, so I gather I need to first correct for this by changing the projection to utm as done here.)

However, because some number of these points will be less than 1 km apart, I am concerned that this total sum is double-counting the population of some cells where buffers overlap. Does buffering automatically correct for this, or is there an efficient way to ensure that this is not the case, and also to get the values from the inverse of the buffered point area selection?

cjsc
  • 95
  • 5

2 Answers2

1

Here is a minimal self-contained reproducible example,

library(raster)
r <- raster(system.file("external/rlogo.grd", package="raster"))
d <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84, 95, 85, 
   66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38, 31, 
   22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2)
p <- SpatialPoints(d, proj4string=crs(r))     

A simple workflow, with points p and raster r would be

b <- buffer(p, 10)
m <- mask(r, b)
ms <- cellStats(m, "sum")
rs <- cellStats(r, "sum")
ms/rs
#[1] 0.4965083

Or you can use terra, to make this go faster, like this

library(terra)
r <- rast(system.file("ex/logo.tif", package="terra")) [[1]]
p <- vect(d, crs=crs(r))

b <- buffer(p, 10)
m <- mask(r, b)
ms <- global(m, "sum", na.rm=TRUE)
rs <- global(r, "sum")
ms/rs

By the way, with the raster package your assertion about needing to transform lon/lat data is not correct for extract or buffer. In contrast, with terra you need to do that (to be fixed).

Nimantha
  • 6,405
  • 6
  • 28
  • 69
Robert Hijmans
  • 40,301
  • 4
  • 55
  • 63
  • Thank you for this. Am I right in understanding that the units in b are meters, and buffer() will automatically adjust this as necessary for the projection in p? I was unsure from the documentation and hesitant to proceed as such given my limited experience working on geospatial / projected data. – cjsc Jun 01 '20 at 16:54
  • Yes, in the raster pacakge the unit of the `width` argument is m if the CRS is longitude/latitude. Otherwise, the unit is whatever is used by the CRS (m in most cases). You should be able to see that by creating them both ways and plotting. There will be some differences introduced by using a projection (and it is therefore better *not* to project and unproject). I still have to implement this for `terra` – Robert Hijmans Jun 01 '20 at 17:17
  • Thanks much for the clarification! – cjsc Jun 01 '20 at 22:17
-1

Well, thanks to a suggestion via Twitter and this guide to creating SpatialPolygons around points, I've been able to find an answer for this. This is probably not the most efficient means of doing so — it's proving to be very slow on large polygons - but it's workable for my purposes.

Here's sample code:

library(tabularaster)
library(raster)
library(tidyverse)
library(geos)

# -----------------------

# load point data ---

p <- read_csv("points_of_interest.csv")
p_df <- p %>% rename(x = lat, y = lon)
p_coords <- p_df[, c("y","x")]

p_spdf <- SpatialPointsDataFrame(
   coords = pc_coords,
   data = p_df,
   proj4string = CRS("+init=epsg:4326"))

# convert projection to metric units

p_mrc <- spTransform(
   p_spdf,
   CRS("+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 
       +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs")
   )

# buffer to 1000 meters

p_mrc_1k_mrc <- gBuffer(
   p_mrc, byid = TRUE, width = 1000)

# switch back to lat/lon
p_mrc_1k <- spTransform(p_mrc_1k_mrc, CRS("+init=epsg:4326"))

# load raster data -------

r <- raster("pop.tif")
r_tib <- tabularaster::as_tibble(r)

# get intersection of cells and polygons
cell_df_1k <- cellnumbers(r, p_mrc_1k)

# get list of cells where there is intersection
target_cell_1k <- cell_df_1k$cell_

# add cell values to df listing all cells covered by polys
target_cells_extract_1k <- cell_df_1k %>%
  rename(cellindex = cell_) %>%
  left_join(r_tib)

# calculate the sum of population within 1k radius for each object 
# (this includes overlapping population cells shared between polys)
cell_sum_1k <- target_cells_extract_1k %>%
  group_by(object_) %>%
  summarize(pop_1k = sum(cellvalue, na.rm = T))

# get only unique cell values for total overlapping coverage of all polys
target_cells_unique_1k <- r_tib %>% filter(cellindex %in% target_cell_1k)

total_coverage_pop <- sum(target_cells_unique_1k$cellvalue, na.rm = T)
outside_coverage_pop <- sum(r_tib$cellvalue) - total_coverage_pop

cjsc
  • 95
  • 5
  • 1
    It would be much more helpful for others to use your accepted answer as a resource if you can provide some sample data. Seeing `points_of_interest.csv` won't be helpful for others trying to solve the same type of problem in the future. – Matt Jun 01 '20 at 14:55