0

I am working with the R programming language.

I have the following map (shapefile):

library(sf)  
library(leaflet)

nc <- st_read(system.file("gpkg/nc.gpkg", package="sf"), quiet = TRUE) %>% 
  st_transform(st_crs(4326)) %>% 
  st_cast('POLYGON')

Now, suppose I have a dataset with information for different polygons within this map (I made some areas missing on purpose

set.seed(123)
unemployement_rate = rnorm(nrow(nc), 50,5)
n <- nrow(nc)
n_NA <- round(n * 0.1)
idx <- sample(n, n_NA)
unemployement_rate[idx] 

my_df = data.frame(nc$NAME, unemployement_rate)

My Question: Assume that both of the above files already exist.

I would like to bring in the unemployment rate into the "nc" file. I am trying to merge both of these files in such a way, such that the number of rows in "nc" will not change.

In the past, I used to use the MATCH function as suggested in a previous question (Merging a Shapefile and a dataframe). However, when I would do this, the NA's would get removed.

Thus, I tried to solve this problem a different way:

names(my_df) <- c("NAME", "unemployement_rate")
nc_merged <- merge(nc, my_df, by = "NAME", all.x = TRUE)

# optional : replace the NA with 9999 
# nc_merged$unemployement_rate[is.na(nc_merged$unemployement_rate)] <- 9999

However, now there appears to be more rows in nc_merged compared to the original file:

> dim(nc)
[1] 108  15
> dim(my_df)
[1] 108   2

> dim(nc_merged)
[1] 128  16

Can someone please show me why this is happening and how I can fix this?

Thanks!

stats_noob
  • 5,401
  • 4
  • 27
  • 83

1 Answers1

1

i misunderstood. you can just use the merge function without aggregating

library(sf)

# Read the shapefile
nc <- st_read(system.file("gpkg/nc.gpkg", package = "sf"), quiet = TRUE) %>%
  st_transform(st_crs(4326)) %>%
  st_cast("POLYGON")

# Generate the dataset with unemployment rate
set.seed(123)
unemployment_rate <- rnorm(nrow(nc), 50, 5)
n_NA <- round(nrow(nc) * 0.1)
idx <- sample(nrow(nc), n_NA)
unemployment_rate[idx] <- NA
my_df <- data.frame(NAME = nc$NAME, unemployment_rate)

# Merge the datasets by NAME
nc_merged <- merge(nc, my_df, by = "NAME", all.x = TRUE)

# View the dimensions of the merged dataset
dim(nc) # Original nc dataset
dim(nc_merged) # Merged dataset
Alex Gordon
  • 57,446
  • 287
  • 670
  • 1,062