I am trying to create several chloropleths/heat maps of NYC in R. I downloaded the shapefile from NYC OpenData here: https://catalog.data.gov/dataset/zip-code-boundaries. And the demographic/other data I am attempting to merge it with and create maps from here: https://github.com/NSAPH/National-Causal-Analysis/blob/master/Confounders/census/processed_data/census_interpolated_zips.zip. The first heat map is plotting PM2.5 (air pollution measure) within each zip code. I first subsetted the demographic dataset (nyczip2016_1) into a new df (zipmap) with only the columns of interest:
zipmap <- nyczip2016_1
zipmap$ZIPCODE <- zipmap$ZIP
zipmap <- subset(zipmap, select=c("black_pop", "hisp_pop", "asian_pop", "white_pop", "medhouseholdincome", "ZIPCODE", "pm25"))
str(zipmap)
and get this:
> str(zipmap)
'data.frame': 230 obs. of 7 variables:
$ black_pop : num 2235 6638 2493 31 245 ...
$ hisp_pop : num 3589 20754 4470 250 596 ...
$ asian_pop : num 5000 33352 7348 744 1625 ...
$ white_pop : num 14814 24591 44001 2201 6425 ...
$ medhouseholdincome: num 85168 35594 100791 123056 130116 ...
$ ZIPCODE : num 10001 10002 10003 10004 10005 ...
$ pm25 : num 9.7 8.85 9 9.6 10.15 ...
Then I read in the shapefile, changed ZIPCODE to numeric rather than character, and merged the two datasets by ZIPCODE:
nycshp <- readOGR("ZIP_CODE_040114.shp")
summary(nycshp@data)
nycshp@data$ZIPCODE <- as.numeric(nycshp@data$ZIPCODE)
nycshp2 <- merge(nycshp, zipmap, duplicateGeoms = T)
And I did a test map without any filling (because I am new to this and want to make sure it worked), which gave me a basic b&w map with the outlines of the zip codes:
testmap <- ggplot() + geom_polygon(data = nycshp2, aes(x = long, y = lat, group = group), colour = "black", fill = NA)
testmap + theme_void()
I also created a third df (nycshp3) and omitted any missing values (basically just ZIP codes with populations of 0, which in turn did not have PM2.5 measurements)
nycshp3@data <- na.omit(nycshp2@data)
But when I attempt to fill the map with PM2.5 values:
nycshp3 %>%
ggplot() +
geom_sf(aes(fill = pm25)) +
scale_fill_viridis_c(option = "plasma", trans = "sqrt")
I consistently get the error:
Regions defined for each Polygons
Error in `geom_sf()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in `FUN()`:
! object 'pm25' not found
Even though I checked and it is part of the nycshp3@data. Here is the output of "summary(nycshp3@data)":
> summary(nycshp3@data)
ZIPCODE BLDGZIP PO_NAME POPULATION AREA
Min. :10001 Length:183 Length:183 Min. : 0 Min. : 47809
1st Qu.:10290 Class :character Class :character 1st Qu.: 25270 1st Qu.: 17956679
Median :11201 Mode :character Mode :character Median : 40591 Median : 35889597
Mean :10812 Mean : 45118 Mean : 43939234
3rd Qu.:11363 3rd Qu.: 63036 3rd Qu.: 55733418
Max. :11694 Max. :109069 Max. :473985727
STATE COUNTY ST_FIPS CTY_FIPS
Length:183 Length:183 Length:183 Length:183
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
URL SHAPE_AREA SHAPE_LEN black_pop hisp_pop asian_pop
Length:183 Min. :0 Min. :0 Min. : 31 Min. : 55 Min. : 0
Class :character 1st Qu.:0 1st Qu.:0 1st Qu.: 1360 1st Qu.: 3978 1st Qu.: 1326
Mode :character Median :0 Median :0 Median : 4079 Median : 8145 Median : 3443
Mean :0 Mean :0 Mean :11491 Mean :13833 Mean : 6377
3rd Qu.:0 3rd Qu.:0 3rd Qu.:14596 3rd Qu.:16866 3rd Qu.: 7599
Max. :0 Max. :0 Max. :87209 Max. :83843 Max. :60445
white_pop medhouseholdincome pm25
Min. : 16 Min. : 21553 Min. : 6.943
1st Qu.: 6994 1st Qu.: 45660 1st Qu.: 7.460
Median :15643 Median : 59190 Median : 7.967
Mean :20161 Mean : 65679 Mean : 8.231
3rd Qu.:27369 3rd Qu.: 80392 3rd Qu.: 8.875
Max. :85096 Max. :234958 Max. :11.675
Does anyone know what could be causing this error? I have never worked with shapefiles before in R, nor am I particularly familiar with what types of variables are involved in this kind of data. Any insight would be very much appreciated!
UPDATE:
In case anyone is looking for an answer, I did end up figuring it out with help from a YouTube video. I'm really not sure what changed but this is the code that worked:
library(sf)
library(tmap)
library(tmaptools)
library(leaflet)
library(ggplot2)
library(dplyr)
options(scipen = 999)
str(zipmap)
nycsf <- st_read("ZIP_CODE_040114.shp", stringsAsFactors = FALSE)
str(nycsf)
nycsfjoin <- inner_join(nycsf, zipmap)
str(nycsfjoin)
pm25map <- ggplot(nycsfjoin) +
geom_sf(aes(fill = pm25)) +
scale_fill_gradient(low = "#FFF0F9", high = "#5D1440") +
theme_void()
Here is my demographic df:
> str(zipmap)
'data.frame': 230 obs. of 7 variables:
$ black_pop : num 2235 6638 2493 31 245 ...
$ hisp_pop : num 3589 20754 4470 250 596 ...
$ asian_pop : num 5000 33352 7348 744 1625 ...
$ white_pop : num 14814 24591 44001 2201 6425 ...
$ medhouseholdincome: num 85168 35594 100791 123056 130116 ...
$ ZIPCODE : Factor w/ 230 levels "10001","10002",..: 1 2 3 4 5 6 7 8 9 10 ...
$ pm25 : num 9.7 8.85 9 9.6 10.15 ...
>
And my shapefile:
> str(nycsf)
Classes ‘sf’ and 'data.frame': 263 obs. of 13 variables:
$ ZIPCODE : chr "11436" "11213" "11212" "11225" ...
$ BLDGZIP : chr "0" "0" "0" "0" ...
$ PO_NAME : chr "Jamaica" "Brooklyn" "Brooklyn" "Brooklyn" ...
$ POPULATION: num 18681 62426 83866 56527 72280 ...
$ AREA : num 22699295 29631004 41972104 23698630 36868799 ...
$ STATE : chr "NY" "NY" "NY" "NY" ...
$ COUNTY : chr "Queens" "Kings" "Kings" "Kings" ...
$ ST_FIPS : chr "36" "36" "36" "36" ...
$ CTY_FIPS : chr "081" "047" "047" "047" ...
$ URL : chr "http://www.usps.com/" "http://www.usps.com/" "http://www.usps.com/" "http://www.usps.com/" ...
$ SHAPE_AREA: num 0 0 0 0 0 0 0 0 0 0 ...
$ SHAPE_LEN : num 0 0 0 0 0 0 0 0 0 0 ...
$ geometry :sfc_POLYGON of length 263; first list element: List of 1
..$ : num [1:159, 1:2] 1038098 1038142 1038171 1038280 1038521 ...
..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
..- attr(*, "names")= chr [1:12] "ZIPCODE" "BLDGZIP" "PO_NAME" "POPULATION" ...
before joining. Maybe this will help someone in the future lol