I am working with the R programming language.
I have the following map (shapefile):
library(sf)
library(leaflet)
nc <- st_read(system.file("gpkg/nc.gpkg", package="sf"), quiet = TRUE) %>%
st_transform(st_crs(4326)) %>%
st_cast('POLYGON')
Now, suppose I have a dataset with information for different polygons within this map (I made some areas missing on purpose
set.seed(123)
unemployement_rate = rnorm(nrow(nc), 50,5)
n <- nrow(nc)
n_NA <- round(n * 0.1)
idx <- sample(n, n_NA)
unemployement_rate[idx]
my_df = data.frame(nc$NAME, unemployement_rate)
My Question: Assume that both of the above files already exist.
I would like to bring in the unemployment rate into the "nc" file. I am trying to merge both of these files in such a way, such that the number of rows in "nc" will not change.
In the past, I used to use the MATCH function as suggested in a previous question (Merging a Shapefile and a dataframe). However, when I would do this, the NA's would get removed.
Thus, I tried to solve this problem a different way:
names(my_df) <- c("NAME", "unemployement_rate")
nc_merged <- merge(nc, my_df, by = "NAME", all.x = TRUE)
# optional : replace the NA with 9999
# nc_merged$unemployement_rate[is.na(nc_merged$unemployement_rate)] <- 9999
However, now there appears to be more rows in nc_merged compared to the original file:
> dim(nc)
[1] 108 15
> dim(my_df)
[1] 108 2
> dim(nc_merged)
[1] 128 16
Can someone please show me why this is happening and how I can fix this?
Thanks!