1

I have recently attempted to do a regionalization analysis with a group of geographic regions, each contains multiple attributes (A1, A2, A3, ...). The goal is not like a regular regionalization problem (such as K-means) in which you define groups with minimal within group dissimilarity but maximal between group dissimilarity.

My regionalization is the opposite, I want the groups to be as similar as possible (although within group does not have to be as dissimilar as possible, but that is of less concern) in terms of means, variance, and other statistics. I ran into the minDiff package and its successor anticlust package in R, and it is able to do the job wonderfully except for one problem: since this is a regionalization problem, I would really want the final groups to be geographically connected. Results from minDiff/anticlust, however, show the different groups are mixed with one another all over the map. Here is a sample code:

A dataframe contains the geographic units and attributes is read from a shapefile and stored in geo.df.

geo.df<-as.data.frame(read_sf(dsn = getwd(), lay = "geolayer", stringsAsFactors = FALSE))

geo.df$class <- anticlustering(geo.df[, c("A1", "A2", "A3", "A4", ..., "An"), K = 5, objective = "variance", standardize = TRUE)

I've tried to include coordinates in the list of attributes (A1, A2, ..., An), pairwise distances, but none worked. I always ended up with well separated groups, but all mixed with one another in the geographic space.

Any pointers on how to proceed from here? Any hints will be greatly appreciated.

Thank you all in advance.

1 Answers1

0

This is a classic regionalization problem. You can solve this with the skater algorithm. Since you haven't provided any reproducible example, I can't provide any working code.

Use the spdep library and skater.

library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.2.3, PROJ 7.2.1; sf_use_s2() is TRUE
library(spdep)
#> Loading required package: sp
#> Loading required package: spData
#> To access larger datasets in this package, install the spDataLarge
#> package with: `install.packages('spDataLarge',
#> repos='https://nowosad.github.io/drat/', type='source')`

bh <- st_read(system.file("etc/shapes/bhicv.shp",
                          package="spdep")[1], quiet=TRUE)

dpad <- data.frame(scale(as.data.frame(bh)[,5:8]))

### neighboorhod list
bh.nb <- poly2nb(bh)

### calculating costs
lcosts <- nbcosts(bh.nb, dpad)

### making listw
nb.w <- nb2listw(bh.nb, lcosts, style="B")


### find a minimum spanning tree
mst.bh <- mstree(nb.w,5)

### three groups with no restriction
res1 <- skater(mst.bh[,1:2], dpad, 4)

plot(st_geometry(bh), col = res1$groups)

Created on 2022-08-18 by the reprex package (v2.0.1)

thus__
  • 460
  • 3
  • 16
  • Thank you for the help. I was able to recreate the result with the skater algorithm. I forgot to add an additional restrict that the number of regions in each group shall be approximately the same. I cannot see to be able to figure that out. Any pointers? Thank you very much. – OpenSource Guy Aug 25 '22 at 01:50
  • Basically, I was trying to do what the package *anticlust* does, but with geographic contiguity as one of the restrictions. I cannot seem to be able to figure that out with a straighforward solution. Any help would be greatly appreciated. – OpenSource Guy Aug 25 '22 at 01:59
  • @OpenSourceGuy I would recommend checking out the Max-P algorithm then. Geoda documentation will likely be your best bet for spatially constrained clustering algorithms https://geodacenter.github.io/workbook/9d_spatial4/lab9d.html#principle-1 – thus__ Aug 26 '22 at 12:20