0

I have a shapfile of school districts in Texas and am trying to use ggplot2 to highlight 10 in particular. I've tinkered with it and gotten everything set up, but when I spot checked it I realized the 10 districts highlighted are not in fact the ones I want to be highlighted.

The shapefile can be downloaded from this link to the Texas Education Agency Public Open Data Site.

#install.packages(c("ggplot2", "rgdal"))
library(ggplot2)
library(rgdal)
#rm(list=ls())

#setwd("path")

# read shapefile
tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp")

# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")

# extract from shapefile data just the name and ID, then subset to only the districts of interest
dist_info <- data.frame(cbind(as.character(tex@data$NAME2), as.character(tex@data$FID)), stringsAsFactors=FALSE)
names(dist_info) <- c("name", "id")
dist_info <- dist_info[dist_info$name %in% districts, ]

# turn shapefile into df
tex_df <- fortify(tex)

# create dummy fill var for if the district is one to be highlighted
tex_df$yes <- as.factor(ifelse(tex_df$id %in% dist_info$id, 1, 0))


# plot the graph
ggplot(data=tex_df) +
  geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") + 
  scale_fill_manual(values=cols) +
  theme_void() +
  theme(legend.position = "none")

As you'll see, when the plot gets created it looks like it's done exactly what I want. The problem is, those ten districts highlighted are not hte ones in the districts vector above. I've re-ran everything clean numerous times, double checked that I wasn't having a factor/character conversion issue, and double checked within the web data explorer that the IDs that I get from the shapefile are indeed the ones that should match with my list of names. I really have no idea where this issue could be coming from.

This is my first time working with shapefiles and rgdal so if I had to guess there's something simple about the structure that I don't understand and hopefully one of you can quickly point it out for me. Thanks!

Here's the output:

enter image description here

cparmstrong
  • 799
  • 6
  • 23
  • 1
    Did my answer solve your question? – mpalanco Mar 16 '18 at 16:20
  • @mpalanco it briefly tried to implement it and had some hiccups. I'm away from my desk and can't get to it right now but when I get it figured out I'll approve your answer. – cparmstrong Mar 16 '18 at 21:58
  • 1
    Ok. I did just run the code below pointing to my working directory without any problem. I assume you're implementing something more complex, and this was just a mock-up. If you think we could help, come here with your questions. Thanks. – mpalanco Mar 16 '18 at 22:11
  • @mpalanco I had an odd error that I couldn't figure out. Found a sketchy workaround that solved my problem and posted as an answer but still gave yours the check. Thanks for the help! – cparmstrong Mar 19 '18 at 14:58
  • 1
    I do not know why you got that error. I tried with broom::tidy() as well without any problem. Many solutions recommend reinstalling the packages used or R (maybe you got an old R version). But it seems you already tried that. I will try to look tomorrow at your solution and see if the 'sketchy' section can be improved. Thank you for accepting my answer. – mpalanco Mar 19 '18 at 21:51
  • I added an alternative 2 to my answer, addressing your problem when passing the argument region to the `fortify` function. It seems to me cleaner than creating IDs with `seq`, two data frames and merging back. – mpalanco Mar 21 '18 at 23:06

2 Answers2

1

Alternative 1

With the fortify function add the argument region specifying "NAME2", the column id will include your district names then. Then create your dummy fill variable based on that column. I am not familiar with Texas districts, but I assume the result is right.

tex <- tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp"))

# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")

# turn shapefile into df
tex_df <- fortify(tex, region = "NAME2")

# create dummy fill var for if the district is one to be highlighted
tex_df$yes <- as.factor(ifelse(tex_df$id %in% districts, 1, 0))

# plot the graph
ggplot(data=tex_df) +
geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") +
scale_fill_manual(values=cols) +
theme_void() +
theme(legend.position = "none")

enter image description here

Alternative 2

Without passing the argument region to fortify function. Addressing seeellayewhy's issue implementing previous alternative. We add two layers, no need to create dummy variable or merge any data frame.

tex <- tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp"))

# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")

 # Subset the shape file into two
tex1 <- subset(tex, NAME2 %in% districts)
tex2 <- subset(tex, !(NAME2 %in% districts)) 

# Create two data frames
tex_df1 <- fortify(tex1)
tex_df2 <- fortify(tex2)

# Plot two geom_polygon layers, one for each data frame
ggplot() +
  geom_polygon(data = tex_df1, 
               aes(x = long, y = lat, group = group, fill = "#CCCCCC"), 
               color = "#CCCCCC")+
  geom_polygon(data = tex_df2, 
               aes(x = long, y = lat, group = group, fill ="#003082")) + 
    scale_fill_manual(values=cols) +
  theme_void() +
  theme(legend.position = "none") 
mpalanco
  • 12,960
  • 2
  • 59
  • 67
  • When I try to add the additional argument to `fortify()` I get the following error: `Error: isTRUE(gpclibPermitStatus()) is not TRUE`. The top results provide some solutions like re-install rgdal/gpclib/rgeos but I wasn't able to make any of them work. Also saw that fortify is deprecated and broom::tidy is recommended instead but it gives the same error. Any idea what the problem may be? – cparmstrong Mar 19 '18 at 14:25
0

When trying to implement @mpalanco's solution of adding the "region" argument to the fortify() function, I got an error that I could solve through numerous other stack posts (Error: isTRUE(gpclibPermitStatus()) is not TRUE). I also tried using broom::tidy() which is the non-deprecated euqivalent to fortify() and had the same error.

Ultimately, I ended up implementing @luchanocho's solution from here. I don't like the fact that it uses seq() to generate the ID because it's not necessarily preserving the proper order, but my case was simple enough that I could go through every district and confirm that the correct ones were highlighted.

My code is below. Output is the same as @mpalanco's answer. Since he obviously got the right result and used something that's not shaky the way the implemented solution is, I'm going to give him the answer assuming it works. The solution below can be considered a workaround if others experience the same error I got.

#install.packages(c("ggplot2", "rgdal"))
library(ggplot2)
library(rgdal)
#rm(list=ls())

#setwd("path")

# read shapefile
tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp")

# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")


# convert shapefile to a df
tex_df <- fortify(tex)

# generate temp df with IDs to merge back in
names_df <- data.frame(tex@data$NAME2)
names(names_df) <- "NAME2"
names_df$id <- seq(0, nrow(names_df)-1)  # this is the part I felt was sketchy
final <- merge(tex_df, names_df, by="id")

# dummy out districts of interest
final$yes <- as.factor(ifelse(final$NAME2 %in% districts, 1, 0))


ggplot(data=final) +
  geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") + 
  scale_fill_manual(values=cols) +
  theme_void() +
  theme(legend.position = "none") 
cparmstrong
  • 799
  • 6
  • 23