2

Merging extra data (frames) to spatial objects in R can be tricky (as explained here, or here)

Searching for a solution on how to correctly do the job I found this SO question listing several methods. dplyr's left_join was not listed there. I spotted it being used in Robin's tutorial.

My question is - is this a correc method to use? Are there any use cases (different number of rows? different rows names? sorting? etc.) that this solution would fail?


Here is some reproducible code illustarting the methods I found / came across:

# libraries
library("spdep"); library("sp"); library("dplyr")

# sopatial data
c <- readShapePoly(system.file("etc/shapes/columbus.shp", package="spdep")[1])
m <- c@data
c@data <- subset(c@data, select = c("POLYID", "INC"))
c@data$INC2 <- c@data$INC
c@data$INC <- NULL
ex <- subset(c, c$POLYID <= 2) # polygons with messed up data in merged df
c <- subset(c, c$POLYID < 49) # remove one polygon from shape so that df has one poly too many

# messing up merge data
m <- subset(m, POLYID != 1) # exclude polygon
m <- subset(m, select = c("POLYID", "INC")) # only two vars

rownames(m) <- m$POLYID - 2 # change rownames
m$POLYID[m$POLYID == 2] <- 0  # wrong ID
m <- m[order(m$INC),] # different sort
m$POLYID2 <- m$POLYID # duplicated to check dplyr

# left_join solution
s1 <- c
s1@data <- left_join(s1@data, m)

plot(c)
plot(s1, col = "red", density = 40, angle = 0, add = TRUE)
plot(ex, col= NA, border = "green", add = TRUE)
View(s1@data)

# match solution
s2 <- c
s2@data = data.frame(s2@data, m[match(s2@data[,"POLYID"], m[,"POLYID"]),])

plot(c)
plot(s2, col = "red", density = 40, angle = 0, add = TRUE)
plot(ex, col= NA, border = "green", add = TRUE)

View(s2@data)

# sp solution
s3 <- c
s3 <- sp::merge(s3, m, by="POLYID")


plot(c)
plot(s3, col = "red", density = 40, angle = 0, add = TRUE)
plot(ex, col= NA, border = "green", add = TRUE)

View(s3@data)

# inner join solution
s4 <- c
s4@data <- inner_join(s4@data, m)


plot(c)
plot(s4, col = "red", density = 40, angle = 0, add = TRUE)
plot(ex, col= NA, border = "green", add = TRUE)

View(s4@data)

# rebuild solution???
s5 <- c

s5.df <- as(s5, "data.frame")
s5.df1 <- merge(s5.df, m, sort=FALSE, by.x="POLYID", by.y="POLYID", all.x=TRUE, all.y=TRUE)
s51 <- SpatialPolygonsDataFrame(as(s5, "SpatialPolygons"), data=s5.df1)

plot(c)
plot(s51, col = "red", density = 40, angle = 0, add = TRUE)
plot(ex, col= NA, border = "green", add = TRUE)

Left join seems to do the job. Same as sp::merge and match ( I do hope there is no messing up the order so for instance plotted polygons are associated with different vales after the merge?). None of the solutions actually removes two polygons withmissing data, but I presume this is correct behaviour in R?

Community
  • 1
  • 1
radek
  • 7,240
  • 8
  • 58
  • 83
  • 2
    Best way to get help to your question is to post the necessary things in this question. Hard to follow four or five different links. `dplyr` package'` `left_join` does a specific thing - join the two tables on common key(s) and retain all rows in the left table. If that is what you want with your data, that can be the right solution. – Gopala Apr 28 '16 at 12:25
  • @Gopala Thanks. Is there any sorting of 'left' data going on when using `left_join`? – radek Apr 28 '16 at 12:29
  • 1
    Not hard to experiment and find out. Example: `df1 <- data.frame(x = 10:1, y = 10:1); df2 <- data.frame(x = 1:5, z = 1:5); left_join(df1, df2);` produces result in same order as original. If you want sorted order of some kind, you can use `arrange` from `dplyr`. – Gopala Apr 28 '16 at 13:00
  • 2
    I put methods for dplyr verbs into https://github.com/mdsumner/spdplyr - I haven't thought enough about joins at all, so use at your own risk - but I'd appreciate feedback. (I'm more interested in getting away from sp as much as possible, providing methods to drive it rather than need to use it directly any more - it's extremely painfully limiting IMO). – mdsumner May 17 '16 at 14:49

0 Answers0