1

I have two spatial dataframes: df_2016 and df_2020. I want to join them by a non spatial dataframe ID, which should be mostly consistent across the two. I used this code:

df_complete <- merge(x=as.data.frame(df_2016 ), y=as.data.frame(df_2020), by="ID", all=TRUE)

There are 1,372,613 observations in df_2016, and 1,423,781in df_2020. There are 1,440,175 observations in df_complete.

Of the originals, there are 16,720 observations in df_2016 that are not in df_2020, and 71,620 observations that are in df_2020 that are not in df_2016. I want to keep the geometry for df_2020 as long as it is there, and then fill in the geometry from df_2016 for the few that are missing. So I used this:

df_complete$geometry <- ifelse(is.na(df_complete$geometry.y), df_complete$geometry.x, df_complete$geometry.y)

Now I want to drop the df_complete$geometry.x and df_complete$geometry.y columns, but get this error:

df_complete= subset(df_complete, select = -c(df_complete$geometry.y, df_complete$geometry.x) )
Error in Ops.sfc(c(df_complete$geometry.y, df_complete$geometry.x)) : 
  argument "e2" is missing, with no default

Additionally, the class of df_complete is now just a dataframe, and I'd really like it to keep its spatial properties if possible. Any advice on how to resolve this would be greatly appreciated!

EDIT ANSWER FROM Skaqqs BELOW:

df_2016_flat <- st_drop_geometry(df_2016)

df_complete <- merge(
  x = df_2020,
  y = df_2016_flat,
  by = "ID", all.x = TRUE)

# Get IDs from 2016 that aren't in 2020
df_2016_not_in_2020 <- setdiff(df_2016$ID, df_2020$ID)

df_complete2 <- rbind(
 df_complete,
 df_2016[df_2016$ID %in% df_2016_not_in_2020,])

rm(df_2016_flat, df_2016_not_in_2020)
tchoup
  • 971
  • 4
  • 11
  • [See here](https://stackoverflow.com/q/5963269/5325862) on making a reproducible example that is easier for folks to help with. It seems like you're working with `sf`, but that isn't clear. It's also hard to do more than guess without having a sample of your data or seeing any of your output – camille Sep 28 '21 at 16:31

1 Answers1

1

Because you haven't shared any sample data, I can't test this. But see below for my general approach as I understand your question:

# Merge by ID
# Only keep matches
# Keep geometry from df_2020
df_complete <- merge(
  x = df_2020,
  y = as.data.frame(df_2016),
  by = "ID", all.x = TRUE)

# Get IDs from 2016 that aren't in 2020
df_2016_not_in_2020 <- setdiff(df_2016$ID, df_2020$ID)

df_complete2 <- rbind(
 df_complete,
 df_2016[df_2016$ID %in% df_2016_not_in_2020,])

I'd be happy to update my answer if you would like more specific advice and are able to share data!

Skaqqs
  • 4,010
  • 1
  • 7
  • 21
  • 1
    Hey, so this had all the pieces I needed! I had to make a couple quick changes; for whatever reason y=as.data.frame(df_2016) went wonky, so I made a temp file where I dropped the geometry instead. I also had to use bind_rows instead of rbind,. Will edit my post to have the answer. Thanks again! – tchoup Sep 28 '21 at 18:03
  • Glad you were able to figure it out, and thanks for being willing to update your answer with your solution. Instead of `as.data.frame()`, you could try `st_drop_geometry()`. – Skaqqs Sep 28 '21 at 18:06