0

I am trying to create several chloropleths/heat maps of NYC in R. I downloaded the shapefile from NYC OpenData here: https://catalog.data.gov/dataset/zip-code-boundaries. And the demographic/other data I am attempting to merge it with and create maps from here: https://github.com/NSAPH/National-Causal-Analysis/blob/master/Confounders/census/processed_data/census_interpolated_zips.zip. The first heat map is plotting PM2.5 (air pollution measure) within each zip code. I first subsetted the demographic dataset (nyczip2016_1) into a new df (zipmap) with only the columns of interest:

zipmap <- nyczip2016_1
zipmap$ZIPCODE <- zipmap$ZIP
zipmap <- subset(zipmap, select=c("black_pop", "hisp_pop", "asian_pop", "white_pop", "medhouseholdincome", "ZIPCODE", "pm25"))
str(zipmap)

and get this:

> str(zipmap)
'data.frame':   230 obs. of  7 variables:
 $ black_pop         : num  2235 6638 2493 31 245 ...
 $ hisp_pop          : num  3589 20754 4470 250 596 ...
 $ asian_pop         : num  5000 33352 7348 744 1625 ...
 $ white_pop         : num  14814 24591 44001 2201 6425 ...
 $ medhouseholdincome: num  85168 35594 100791 123056 130116 ...
 $ ZIPCODE           : num  10001 10002 10003 10004 10005 ...
 $ pm25              : num  9.7 8.85 9 9.6 10.15 ...

Then I read in the shapefile, changed ZIPCODE to numeric rather than character, and merged the two datasets by ZIPCODE:

nycshp <- readOGR("ZIP_CODE_040114.shp")
summary(nycshp@data)
nycshp@data$ZIPCODE <- as.numeric(nycshp@data$ZIPCODE)
nycshp2 <- merge(nycshp, zipmap, duplicateGeoms = T)

And I did a test map without any filling (because I am new to this and want to make sure it worked), which gave me a basic b&w map with the outlines of the zip codes:

testmap <- ggplot() + geom_polygon(data = nycshp2, aes(x = long, y = lat, group = group), colour = "black", fill = NA)
testmap + theme_void()

I also created a third df (nycshp3) and omitted any missing values (basically just ZIP codes with populations of 0, which in turn did not have PM2.5 measurements)

nycshp3@data <- na.omit(nycshp2@data)

But when I attempt to fill the map with PM2.5 values:

nycshp3 %>%
  ggplot() +
  geom_sf(aes(fill = pm25)) +
  scale_fill_viridis_c(option = "plasma", trans = "sqrt")

I consistently get the error:

Regions defined for each Polygons
Error in `geom_sf()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in `FUN()`:
! object 'pm25' not found

Even though I checked and it is part of the nycshp3@data. Here is the output of "summary(nycshp3@data)":

> summary(nycshp3@data)
    ZIPCODE        BLDGZIP            PO_NAME            POPULATION          AREA          
 Min.   :10001   Length:183         Length:183         Min.   :     0   Min.   :    47809  
 1st Qu.:10290   Class :character   Class :character   1st Qu.: 25270   1st Qu.: 17956679  
 Median :11201   Mode  :character   Mode  :character   Median : 40591   Median : 35889597  
 Mean   :10812                                         Mean   : 45118   Mean   : 43939234  
 3rd Qu.:11363                                         3rd Qu.: 63036   3rd Qu.: 55733418  
 Max.   :11694                                         Max.   :109069   Max.   :473985727  
    STATE              COUNTY            ST_FIPS            CTY_FIPS        
 Length:183         Length:183         Length:183         Length:183        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
     URL              SHAPE_AREA   SHAPE_LEN   black_pop        hisp_pop       asian_pop    
 Length:183         Min.   :0    Min.   :0   Min.   :   31   Min.   :   55   Min.   :    0  
 Class :character   1st Qu.:0    1st Qu.:0   1st Qu.: 1360   1st Qu.: 3978   1st Qu.: 1326  
 Mode  :character   Median :0    Median :0   Median : 4079   Median : 8145   Median : 3443  
                    Mean   :0    Mean   :0   Mean   :11491   Mean   :13833   Mean   : 6377  
                    3rd Qu.:0    3rd Qu.:0   3rd Qu.:14596   3rd Qu.:16866   3rd Qu.: 7599  
                    Max.   :0    Max.   :0   Max.   :87209   Max.   :83843   Max.   :60445  
   white_pop     medhouseholdincome      pm25       
 Min.   :   16   Min.   : 21553     Min.   : 6.943  
 1st Qu.: 6994   1st Qu.: 45660     1st Qu.: 7.460  
 Median :15643   Median : 59190     Median : 7.967  
 Mean   :20161   Mean   : 65679     Mean   : 8.231  
 3rd Qu.:27369   3rd Qu.: 80392     3rd Qu.: 8.875  
 Max.   :85096   Max.   :234958     Max.   :11.675 

Does anyone know what could be causing this error? I have never worked with shapefiles before in R, nor am I particularly familiar with what types of variables are involved in this kind of data. Any insight would be very much appreciated!

UPDATE:

In case anyone is looking for an answer, I did end up figuring it out with help from a YouTube video. I'm really not sure what changed but this is the code that worked:

library(sf)
library(tmap)
library(tmaptools)
library(leaflet)
library(ggplot2)
library(dplyr)
options(scipen = 999)
str(zipmap)
nycsf <- st_read("ZIP_CODE_040114.shp", stringsAsFactors = FALSE)
str(nycsf)
nycsfjoin <- inner_join(nycsf, zipmap)
str(nycsfjoin)
pm25map <- ggplot(nycsfjoin) +
  geom_sf(aes(fill = pm25)) +
  scale_fill_gradient(low = "#FFF0F9", high = "#5D1440") +
  theme_void()

Here is my demographic df:

> str(zipmap)
'data.frame':   230 obs. of  7 variables:
 $ black_pop         : num  2235 6638 2493 31 245 ...
 $ hisp_pop          : num  3589 20754 4470 250 596 ...
 $ asian_pop         : num  5000 33352 7348 744 1625 ...
 $ white_pop         : num  14814 24591 44001 2201 6425 ...
 $ medhouseholdincome: num  85168 35594 100791 123056 130116 ...
 $ ZIPCODE           : Factor w/ 230 levels "10001","10002",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ pm25              : num  9.7 8.85 9 9.6 10.15 ...
> 

And my shapefile:

> str(nycsf)
Classes ‘sf’ and 'data.frame':  263 obs. of  13 variables:
 $ ZIPCODE   : chr  "11436" "11213" "11212" "11225" ...
 $ BLDGZIP   : chr  "0" "0" "0" "0" ...
 $ PO_NAME   : chr  "Jamaica" "Brooklyn" "Brooklyn" "Brooklyn" ...
 $ POPULATION: num  18681 62426 83866 56527 72280 ...
 $ AREA      : num  22699295 29631004 41972104 23698630 36868799 ...
 $ STATE     : chr  "NY" "NY" "NY" "NY" ...
 $ COUNTY    : chr  "Queens" "Kings" "Kings" "Kings" ...
 $ ST_FIPS   : chr  "36" "36" "36" "36" ...
 $ CTY_FIPS  : chr  "081" "047" "047" "047" ...
 $ URL       : chr  "http://www.usps.com/" "http://www.usps.com/" "http://www.usps.com/" "http://www.usps.com/" ...
 $ SHAPE_AREA: num  0 0 0 0 0 0 0 0 0 0 ...
 $ SHAPE_LEN : num  0 0 0 0 0 0 0 0 0 0 ...
 $ geometry  :sfc_POLYGON of length 263; first list element: List of 1
  ..$ : num [1:159, 1:2] 1038098 1038142 1038171 1038280 1038521 ...
  ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "names")= chr [1:12] "ZIPCODE" "BLDGZIP" "PO_NAME" "POPULATION" ...

before joining. Maybe this will help someone in the future lol

0 Answers0