9

So I have a data frame in R called obesity_map which basically gives me the state, county, and obesity rate per county. It looks more or less like this:

obesity_map = data.frame(state, county, obesity_rate)

I'm trying to visualize this on the map by showing various obesity rates per county throughout the US with this:

us.state.map <- map_data('state')
head(us.state.map)
states <- levels(as.factor(us.state.map$region))
df <- data.frame(region = states, value = runif(length(states), min=0, max=100),stringsAsFactors = FALSE)

map.data <- merge(us.state.map, df, by='region', all=T)
map.data <- map.data[order(map.data$order),]
head(map.data)

map.county <- map_data('county')
county.obesity <- data.frame(region = obesity_map$state, subregion = obesity_map$county, value = obesity_map$obesity_rate)
map.county <- merge(county.obesity, map.county, all=TRUE)
ggplot(map.county, aes(x = long, y = lat, group=group, fill=as.factor(value))) + geom_polygon(colour = "white", size = 0.1)

And it basically creates an image that looks like this:

img

As you can see, the US is divided into strange shapes, the colors aren't one consistent color in varying gradients, and you can't make much from it. But what I really want is something like this below but with each county filled in:

img2

I'm fairly new to this so I'd appreciate any and all help!


Edit:

Here's the output of the dput:

dput(obesity_map)
structure(list(X = 1:3141, FIPS = c(1L, 3L, 5L, 7L, 9L, 11L, 
13L, 15L, 17L, 19L, 21L, 23L, 25L, 27L, 29L, 31L, 33L, 35L, 37L, 
39L, 41L, 43L, 45L, 47L, 49L, 51L, 53L, 55L, 57L, 59L, 61L, 63L, 
65L, 67L, 69L, 71L, 73L, 75L, 77L, 79L, 81L, 83L, 85L, 87L, 89L, 
91L, 93L, 95L, 97L, 99L, 101L, 103L, 105L, 107L, 109L, 111L, 
113L, 115L, 117L, 119L, 121L, 123L, 125L, 127L, 129L, 131L, 133L, 
13L, 16L, 20L, 50L, 60L, 68L, 70L, 90L, 100L, 110L, 122L, 130L, 
150L, 164L, 170L, 180L, 185L, 188L, 201L, 220L, 232L, 240L, 261L, 
270L, 280L, 282L, 290L, 1L, 3L, 5L, 7L, 9L, 11L, 12L, 13L, 15L, 
17L, 19L, 21L, 23L, 25L, 27L, 1L, 3L, 5L, 7L, 9L, 11L, 13L, 15L, 
17L, 19L, 21L, 23L, 25L, 27L, 29L, 31L, 33L, 35L, 37L, 39L, 41L, 

It's a huge amount of numbers because it's for every US county so I abbreviated the results and put in the first couple lines.

Basically, the data frame looks like this though:

print(head(obesity_map))


  X FIPS state_names county_names obesity
1 1    1     Alabama      Autauga    24.5
2 2    3     Alabama      Baldwin    23.6
3 3    5     Alabama      Barbour    25.6
4 4    7     Alabama         Bibb     0.0
5 5    9     Alabama       Blount    24.2
6 6   11     Alabama      Bullock     0.0

I also tried using ggcounty by following the example put up but I keep getting an error. I'm not entirely sure what I've done wrong:

library(ggcounty)

# breaks
obesity_map$obese <- cut(obesity_map$obesity, 
                  breaks=c(0, 5, 10, 15, 20, 25, 30), 
                  labels=c("1", "2", "3", "4", 
                           "5", "6"),
                  include.lowest=TRUE)

# get the US counties map (lower 48)
us <- ggcounty.us()

# start the plot with our base map
gg <- us$g

# add a new geom with our population (choropleth)
gg <- gg + geom_map(data=obesity_map, map=us$map,
                aes(map_id=FIPS, fill=obesity_map$obese), 
                color="white", size=0.125)

But I always end up getting an error saying: "Error: Argument must be coercible to non-negative integer"

Any idea? Thanks again for all your help! I appreciate it so much.

UseR10085
  • 7,120
  • 3
  • 24
  • 54
user3648073
  • 91
  • 1
  • 1
  • 3
  • I'm afraid you cannot map "various obesity rates" to a single county (geometry). You'll have to find a measure (mean?) or category (or interval) and map it to each administrative area. Would this suffice? – Paulo E. Cardoso May 17 '14 at 17:19
  • I'm not entirely sure if I'm understanding you? The data in the data frame is an average for each US county so I'm trying to get each rate to show on the map. So for example, if LA County had a 40% obesity rate and Orange County had a 50% obesity rate, it'd show the two different counties shaded in a varying gradient for their percentage. I apologize if my original post lacked clarity! – user3648073 May 17 '14 at 17:23
  • I'll put forth a shameless plug for my [ggcounty](https://github.com/hrbrmstr/ggcounty) package (heavy alpha/beta but there are examples there). If you can do a `dput` of your obesity data frame, I'll be happy to toss up an example here. – hrbrmstr May 17 '14 at 17:55
  • I see. You used '"various obesity rates per county". Could you add a sample of your data? – Paulo E. Cardoso May 17 '14 at 18:31
  • @PauloCardoso, sure thing! Sorry about the horrendous formatting. Still figuring things out! `%%R obesity = read.csv('./data/cleaned/obesitymap.csv', header = T)` `print(head(obesity))` ` X state_names county_names obesity 1 1 Alabama Autauga 24.5 2 2 Alabama Baldwin 23.6 3 3 Alabama Barbour 25.6 4 4 Alabama Bibb -1111.1 5 5 Alabama Blount 24.2 6 6 Alabama Bullock -1111.1` – user3648073 May 17 '14 at 19:09
  • 1
    perhaps paste output of a `dput(obesity)` in the original post :-) – hrbrmstr May 17 '14 at 19:17
  • @hrbrmstr Up! And I've got some questions for you about ggcounty! Thank you so much by the way. I appreciate all the help! – user3648073 May 17 '14 at 20:57

6 Answers6

18

Maybe a little late for another answer, but still worthwhile to share I think.

The reading and preprocessing of the data is similar to jlhoward's answer, with some differences:

library(tmap)      # package for plotting
library(readxl)    # for reading Excel
library(maptools)  # for unionSpatialPolygons

# download data
download.file("http://www.ers.usda.gov/datafiles/Food_Environment_Atlas/Data_Access_and_Documentation_Downloads/Current_Version/DataDownload.xls", destfile = "DataDownload.xls", mode="wb")
df <- read_excel("DataDownload.xls", sheet = "HEALTH")

# download shape (a little less detail than in the other scripts)
f <- tempfile()
download.file("http://www2.census.gov/geo/tiger/GENZ2010/gz_2010_us_050_00_20m.zip", destfile = f)
unzip(f, exdir = ".")
US <- read_shape("gz_2010_us_050_00_20m.shp")

# leave out AK, HI, and PR (state FIPS: 02, 15, and 72)
US <- US[!(US$STATE %in% c("02","15","72")),]  

# append data to shape
US$FIPS <- paste0(US$STATE, US$COUNTY)
US <- append_data(US, df, key.shp = "FIPS", key.data = "FIPS")

When the correct data is attached to the shape object, a choropleth can be drawn with one line of code:

qtm(US, fill = "PCT_OBESE_ADULTS10")

enter image description here

This could be enhanced by adding state borders, a better projection, and a title:

# create shape object with state polygons
US_states <- unionSpatialPolygons(US, IDs=US$STATE)

tm_shape(US, projection="+init=epsg:2163") +
  tm_polygons("PCT_OBESE_ADULTS10", border.col = "grey30", title="") +
tm_shape(US_states) +
  tm_borders(lwd=2, col = "black", alpha = .5) +
tm_layout(title="2010 Adult Obesity by County, percent", 
          title.position = c("center", "top"),
          legend.text.size=1)

enter image description here

Martijn Tennekes
  • 1,951
  • 13
  • 19
  • Do you know why this error would occur? I didn't make any changes to your code: Error in readOGR(dir, base, verbose = FALSE, ...) : no features found – PBD10017 May 28 '16 at 02:19
  • I do not get that error. Was the download successful? You could try to use readOGR directly. – Martijn Tennekes May 29 '16 at 10:36
  • `library(tmap)` fails with error `Error: package or namespace load failed for ‘tmap’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]): there is no package called ‘units’`. @MartijnTennekes are you still able to run the code? – miguelmorin Jun 21 '18 at 12:26
16

So this is a similar example but attempts to accommodate the format of your obesity_map dataset. It also uses a data table join which is much faster than merge(...), especially with large datasets like yours.

library(ggplot2)
# this creates an example formatted as your obesity.map - you have this already...
set.seed(1)    # for reproducible example
map.county <- map_data('county')
counties   <- unique(map.county[,5:6])
obesity_map <- data.frame(state_names=counties$region, 
                          county_names=counties$subregion, 
                          obesity= runif(nrow(counties), min=0, max=100))

# you start here...
library(data.table)   # use data table merge - it's *much* faster
map.county <- data.table(map_data('county'))
setkey(map.county,region,subregion)
obesity_map <- data.table(obesity_map)
setkey(obesity_map,state_names,county_names)
map.df      <- map.county[obesity_map]

ggplot(map.df, aes(x=long, y=lat, group=group, fill=obesity)) + 
  geom_polygon()+coord_map()

Also, if your dataset has the FIPS codes, which it seems to, I'd strongly recommend you use the US Census Bureau's TIGER/Line county shapefile (which also has these codes), and merge on that. This is much more reliable. For example, in your extract of the obesity_map data frame, the states and counties are capitalized, whereas in the built-in counties dataset in R, they are not, so you would have to deal with that. Also, the TIGER file is up to date, whereas the internal dataset is not.

So this is kind of an interesting question. Turns out the actual obesity data is on the USDA website and can be downloaded here as an MSExcel file. There's also a shapfile of US counties on the Census Bureau website, here. Both the Excel file and the shapefile have FIPS information. In R this can be put together relatively simply:

library(XLConnect)    # for loadWorkbook(...) and readWorksheet(...)
library(rgdal)        # for readOGR(...)
library(RcolorBrewer) # for brewer.pal(...)
library(data.table)

setwd(" < directory with all your files > ")
wb <- loadWorkbook("DataDownload.xls")   # from the USDA website
df <- readWorksheet(wb,"HEALTH")         # this sheet has the obesity data

US.counties <- readOGR(dsn=".",layer="gz_2010_us_050_00_5m")
#leave out AK, HI, and PR (state FIPS: 02, 15, and 72)
US.counties <- US.counties[!(US.counties$STATE %in% c("02","15","72")),]  
county.data <- US.counties@data
county.data <- cbind(id=rownames(county.data),county.data)
county.data <- data.table(county.data)
county.data[,FIPS:=paste0(STATE,COUNTY)] # this is the state + county FIPS code
setkey(county.data,FIPS)      
obesity.data <- data.table(df)
setkey(obesity.data,FIPS)
county.data[obesity.data,obesity:=PCT_OBESE_ADULTS10]

map.df <- data.table(fortify(US.counties))
setkey(map.df,id)
setkey(county.data,id)
map.df[county.data,obesity:=obesity]

ggplot(map.df, aes(x=long, y=lat, group=group, fill=obesity)) +
  scale_fill_gradientn("",colours=brewer.pal(9,"YlOrRd"))+
  geom_polygon()+coord_map()+
  labs(title="2010 Adult Obesity by Country, percent",x="",y="")+
  theme_bw()

to produce this:

jlhoward
  • 58,004
  • 7
  • 97
  • 140
  • Oh wow, thank you so much! I appreciate it! This really did the trick. One final question. Do you know how I might go around setting 0 values to be a separate color? I set all NA values to 0 and I'd like to have it show up as grey. Again I appreciate the help! – user3648073 May 18 '14 at 05:35
  • Actually, if you leave them as NA, they should show up as grey by default. Since you're new to SO, [please read this](http://stackoverflow.com/help/no-one-answers). – jlhoward May 18 '14 at 06:29
  • +1 for this great example with data.table. I just still trying to understand how data.table is working in your first map, particularly with `map.county[obesity_map]`. This is for merging? Thanks for sharing this. – Paulo E. Cardoso May 18 '14 at 23:55
  • @jlhoward the second part of the code with FIPS works in a script and fails inside a function, even just a wrapper, with this error: `Error in :=(FIPS, paste0(STATE, COUNTY)) : Check that is.data.table(DT) == TRUE. Otherwise, := and :=(...) are defined for use in j, once only and in particular ways. See help(":=").` Do you have any idea why? – miguelmorin Jun 21 '18 at 15:23
8

This is something I can get working managing the mapping variable. Renaming it to 'region'.

library(ggplot2)
library(maps)
m.usa <- map_data("county")
m.usa$id <- m.usa$subregion
m.usa <- m.usa[ ,-5]
names(m.usa)[5] <- 'region'


df <- data.frame(region = unique(m.usa$region),
                 obesity = rnorm(length(unique(m.usa$region)), 50, 10),
                 stringsAsFactors = F)

head(df)
region  obesity
1 autauga 44.54833
2 baldwin 68.61470
3 barbour 52.19718
4    bibb 50.88948
5  blount 42.73134
6 bullock 59.93515

ggplot(df, aes(map_id = region)) +
  geom_map(aes(fill = obesity), map = m.usa) + 
  expand_limits(x = m.usa$long, y = m.usa$lat) +
  coord_map()

geom_map

Paulo E. Cardoso
  • 5,778
  • 32
  • 42
  • This works! The problem is that `map_data(county)` uses county and state names as identifiers, which is a problem for counties with different spellings, like "Ste Genevieve"/"Ste Genevieve" in Louisiana. [This R-bloggers post](https://www.r-bloggers.com/mapping-us-counties-in-r-with-fips/) suggests using a shapefile with FIPS codes provided by the Census Bureau. – miguelmorin Jun 21 '18 at 14:39
1

I think all you needed to do was reorder the map.county variable like you had for the map.data variable previously.

....
map.county <- merge(county.obesity, map.county, all=TRUE)

## reorder the map before plotting
map.county <- map.county[order(map.data$county),] 

## plot
ggplot(map.county, aes(x = long, y = lat, group=group, fill=as.factor(value))) + geom_polygon(colour = "white", size = 0.1)
Scott Worland
  • 1,352
  • 1
  • 12
  • 15
1

Building on @jlhoward's answer: the code with data.table fails for me in a mysterious way:

 Error in `:=`(FIPS, paste0(STATE, COUNTY)) : 
  Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":="). 

This error happened to me several times, but only when the code was inside a function, even just a minimal wrapper. It worked fine in a script. Although now I cannot reproduce the error, I adapted his/her code with merge() instead of data.table for completeness:

library(rgdal)        # for readOGR(...)
library(ggplot2)      # for fortify() and plot()
library(RColorBrewer) # for brewer.pal(...)

US.counties <- readOGR(dsn=".",layer="gz_2010_us_050_00_5m")
#leave out AK, HI, and PR (state FIPS: 02, 15, and 72)
US.counties <- US.counties[!(US.counties$STATE %in% c("02","15","72")),]
county.data <- US.counties@data

county.data <- cbind(id=rownames(county.data),county.data)
county.data$FIPS <- paste0(county.data$STATE, county.data$COUNTY) # this is the state + county FIPS code

df <- data.frame(FIPS=county.data$FIPS,
                 PCT_OBESE_ADULTS10= runif(nrow(county.data), min=0, max=100))

# Merge county.data to obesity
county.data <- merge(county.data,
                     df,
                     by.x = "FIPS",
                     by.y = "FIPS")

map.df <- fortify(US.counties)

# Merge the map to county.data
map.df <- merge(map.df, county.data, by.x = "id", by.y = "id")

ggplot(map.df, aes(x=long, y=lat, group=group, fill=PCT_OBESE_ADULTS10)) +
  scale_fill_gradientn("",colours=brewer.pal(9,"YlOrRd"))+
  geom_polygon()+coord_map()+
  labs(title="2010 Adult Obesity by Country, percent",x="",y="")+
  theme_bw()
miguelmorin
  • 5,025
  • 4
  • 29
  • 64
0

I'm a little new at using TMAP and Spatial data, but figured I would post as a follow up to Martijn Tennekes. Using his advice I ran into an error in the second map (with the state borders). When running this line of code:

US_state <- unionSpatialPolygons(US,US$STATE)

I kept getting this error: "Error in unionSpatialPolygons(US, US$STATE) : not a SpatialPolygons object"

In order to rectify I had to use a different variable and run it as a Spatial Polygon Data Frame:

US <- read_shape("gz_2010_us_050_00_20m.shp")
US2<-readShapeSpatial("gz_2010_us_050_00_20m.shp")

US <- US[!(US$STATE %in% c("02","15","72")),]  

US$FIPS <- paste0(US$STATE, US$COUNTY)
US <- append_data(US, med_inc_df, key.shp = "FIPS", key.data = "GEOID")

#the difference is here:
US_states <- unionSpatialPolygons(US2, US2$STATE)

tm_shape(US, projection="+init=epsg:2163") +
  tm_polygons("estimate", border.col = "grey30", title="") +
  tm_shape(US_states) +
  tm_borders(lwd=2, col = "black", alpha = .5) +
  tm_layout(title="2016 Median Income by County", 
            title.position = c("center", "top"),
            legend.text.size=1)

My Map

Andy White
  • 81
  • 1
  • 4