3

I'm trying to replicate approximately a map like this. desc

It depicts a small number of items (schools) spread across an area. For input I have the map of areas with a number for each of them. I would like to lay that out into a that many points around the area. It would be even better if they wouldn't diffuse across area boundaries, but simply distributing them is enough. Some nice repel points within area might work.

Beeswarm plots do something quite similar, could this be done on a map. Bonus question - in fact I've been looking to animate this, but can only think of very complicated ways to do this, so that new points are added as sum nrs increase in time.

The code below places the points in centroids on the map, and takes the number as a size. (I was unable to export the map properly as a single file, so coordinates are a bit messed up, but principle is the same.)

places = st_read("https://gist.githubusercontent.com/peeter-t2/9646a4169e993948fa97f6f503a0688b/raw/cb4e910bf153e51e3727dc9d1c73dd9ef86d2556/kih1897m.geojson", stringsAsFactors = FALSE)

schools <- read_tsv("https://gist.github.com/peeter-t2/34467636b3c1017e89f33284d7907b42/raw/6ea7dd6c005ef8577b36f5e84338afcb6c76b707/school_nums.tsv")
schools_geo <- merge(places,schools,by.x="KIHELKOND",by.y="Kihelkond") #94 matches

p<- schools_geo %>% 
  ggplot()+
  geom_sf(data=schools_geo)+
  geom_sf(data=st_centroid(schools_geo),aes(size=value))+
  theme_bw()
p

Thanks!

puslet88
  • 1,288
  • 15
  • 25
  • your `places` data is being read in as lon/lat but your geometry column looks like a different crs. What crs should the data be? – Chris Sep 08 '19 at 16:28
  • I wasn't sure but you found a solution. I've been using some very old .shp shapefiles and didn't find ways to save it well. In fact originally, the crs said no EPSG code, even though it showed properly. Rereading from geojson with `st_crs() <- 3301` as you suggest, fixed a few other errors too. `Although coordinates are longitude/latitude, st_intersects assumes that they are planar`, `Error in CPL_geos_is_empty(st_geometry(x)) : Evaluation error: IllegalArgumentException: Points of LinearRing do not form a closed linestring.` But truly not sure what was going on, so grateful for the fix. – puslet88 Sep 08 '19 at 19:35

2 Answers2

6

As I noted in my comments, the when I read in the file it is setting the crs to lat/lon (epsg: 4326) while the geometry column is a different crs. I have guessed that the correct crs is espg: 3301 and proceeded on that basis which seems to work fine.

st_crs(schools_geo) <- 3301

We can use st_sample to get a sample of points within the polygons in relation to our 'value' column:

# we can set type = 'hexagonal', 'regular' or 'random'
school_pts <- schools_geo %>% st_sample(size = .$value, type = 'hexagonal')


schools_geo %>% 
  ggplot()+
  geom_sf()+
  geom_sf(data=school_pts, size = .8)+
  theme_bw()

This produces the following plot which I think looks messy due to the fact st_sample spreads the points out to the extents of the polygons.

enter image description here

It might look nicer to have the points more centered in each of the polygons like in the example you posted. To do that we could rescale the polygons depending on the number of points we want to plot within them. In the code below, I shrink the polygons by 90% if they have the least points inside (1) and by 20% if they have the most points (27).

# put values on scale between 0 and 1
scale_fact <- (max(schools_geo$value) -  schools_geo$value) / (max(schools_geo$value) - min(schools_geo$value)) 
# re-scale between 0.2 and 0.9
scale_fact <- scale_fact * (0.9 - 0.2) + 0.2
# reverse the scale 
scale_fact <-  max(scale_fact) + min(scale_fact) - scale_fact 

# apply the scale factor
schools_centroid <- st_geometry(st_centroid(schools_geo))
schools_geo_rescaled <- (st_geometry(schools_geo) - schools_centroid) * scale_fact + schools_centroid

school_pts <- schools_geo_rescaled %>% 
  st_sf(crs = 3301) %>% 
  bind_cols(value = schools_geo$value) %>%
  st_sample(size = .$value, type = 'hexagonal')


# plot
schools_geo %>% 
  ggplot()+
  geom_sf()+
  geom_sf(data=school_pts, size = .8)+
  theme_bw()

enter image description here

Chris
  • 3,836
  • 1
  • 16
  • 34
  • This is amazing, and beautiful. I should play around with the scaling a bit. Currently, the areas that are narrow clump them very together, and wide areas, distribute them wide apart, while in between it works just right. Any idea why this could be? But a very smart solution indeed! PS commented on the crs above. – puslet88 Sep 08 '19 at 19:31
  • 1
    Yes, play around with the scaling as well as the size of the points in the plot. The reasoning for the difference in spacing is that `st_sample` will fill the space it is given. If there are a lot of points in a small space, they will be distributed much closer together. Some of that can be improved by adjusting the way the scaling is done, but some is just the data (lots of points in a small space). – Chris Sep 08 '19 at 19:47
  • @puslet88 also thanks for the edit, I had missed a few lines there somehow! – Chris Sep 08 '19 at 19:49
3

it is not an easy problem. I decided to simplify it choosing only one area not all. In theory, the solution is reproductible for all your areas.

We first import our library

library(rgdal)
library(sf)
library(readr)
library(ggplot2)

We use the proposed data :

places <- st_read("https://gist.githubusercontent.com/peeter-t2/9646a4169e993948fa97f6f503a0688b/raw/cb4e910bf153e51e3727dc9d1c73dd9ef86d2556/kih1897m.geojson", stringsAsFactors = FALSE)

schools <- read_tsv("https://gist.github.com/peeter-t2/34467636b3c1017e89f33284d7907b42/raw/6ea7dd6c005ef8577b36f5e84338afcb6c76b707/school_nums.tsv")
schools_geo <- merge(places,schools,by.x="KIHELKOND",by.y="Kihelkond") #94 matches

We select one state

one <- places$geometry[[1]]

We split the polygon in several sub-polygon thanks to a grid

grid <- st_make_grid(one, n = c(10, 10))
area <- st_area(grid)
grid <- st_as_sf(data.frame(ID=c(1:length(area)),
                            area=area,
                            geometry=grid))
tmp <- st_intersection(grid, one)
tmp$area <- st_area(tmp)

We display all centroids of our grid made by small square

plot(st_geometry(tmp['area']))
plot(st_geometry(st_centroid(tmp['area'])),
     pch = 16, col = 'red', add = TRUE)

At the end we want to keep only the number of points you want which is nbr an equivalent of value (number of schools) in your example.

nbr <- 20
plot(st_geometry(one))
plot(st_geometry(st_centroid(tmp[order(tmp$area, decreasing = T),][1:nbr,])),
     pch = 16, col = 'red', add = TRUE)

I hope it will help you. enter image description here

Rémi Coulaud
  • 1,684
  • 1
  • 8
  • 19
  • Thanks a lot! That is indeed a neat solution. @Chris wrote a very elegant one above so I accepted that one. I may look to combine the two for nicety even. st_sample() is a nice function! – puslet88 Sep 08 '19 at 19:39
  • I absolutly agree that `st_sample()` is the right solution. – Rémi Coulaud Sep 09 '19 at 06:51