0

I have two data sets, one for crime in Chicago, labeled with longitude and latitude coords and a shapefile of census blocks also in Chicago. Is it possible in R to aggregate crimes within census blocks, given these two files? The purpose is to be able to map out the crimes by census block.

Location for download of Chicago census tract data: https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Census-Blocks-2000/uktd-fzhd

Location for download of crime data: https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2

Some code that I have pruned down from another project. When it is through there is a spatial object for census tract information and a dataframe containing crime data, including lon/lat coords:

library(rgdal)
library(dplyr)

#Helper function to reduce crime data to single year and limit to variables of interest
yearReduce <- function(rawData=NULL,year=NULL) {
        datout <- data.frame(year = numeric(0), community = numeric(0), type = numeric(0), arrest = numeric(0),
                             Latitude = numeric(0), longitude = numeric(0))
        dat <- rawData[rawData$Year==year,]
        datout <- data.frame(year = dat$Year, community = as.numeric(dat$Community.Area), type = dat$Primary.Type, arrest = dat$Arrest,
                             latitude = dat$Latitude, longitude = dat$Longitude)
        datout
}

#Load crime data
crimedata <- '~/Documents/data/Crimes_-_2001_to_present.csv'
mydata_crime <- read.csv(crimedata,na.strings = c("", " ", "NA"), stringsAsFactors=F)
mydata_crime$Primary.Type <- tolower(mydata_crime$Primary.Type)

#Set cwd to location of the census tract shape file
setwd('~/Documents/data/Boundaries_-_Census_Blocks_-_2010/')
#Create spatial vector object and transform projection
tract = readOGR(".","CensusBlockTIGER2010") %>% spTransform(CRS("+proj=longlat +datum=WGS84"))

### Process crime data to narrow to single year###
crime2010 <- yearReduce(mydata_crime,'2010')

# further select specific crime(s). Fairly limited for testing purposes
violent_crimes <- subset(crime2010,
                         type == "homicide")

violent_crimes <- violent_crimes[complete.cases(violent_crimes),] #Clean data a little bit

Thank you for any help you may be able to provide.

Patrick

Patrick Williams
  • 694
  • 8
  • 22
  • yep. it is. and at least a bit of the data wld be useful to show you how. – hrbrmstr Apr 26 '15 at 00:25
  • Sure. Posted from mobile. Will edit ASAP. – Patrick Williams Apr 26 '15 at 00:29
  • What exactly do you mean by "aggregate"? In other words these two shapefiles and associated data go into the "aggregate" step, what do you want coming out? – shekeine Apr 26 '15 at 20:52
  • Hypothetically the output would be a two column data frame. The first column would be the identity of the census block, and the second column would be the number of crimes that took place in that census block. The crime dataset already has something like this for neighborhood, and in the past I have made a map of crime by neighborhood (using mapbox: (http://goo.gl/2IjWk4). I'd like to do something similar by census block, as well as to have a dataset that I can match up with other census data. – Patrick Williams Apr 26 '15 at 23:32

1 Answers1

1
#Load libraries
library(rgdal)
library(sp)
library(raster)

First, a few improvements to your code above

#Set my wd
setwd('~/Dropbox/rstats/r_blog_home/stack_o/')

#Load crime data
my_crime <- read.csv(file='spat_aggreg/Crimes_2001_to_present.csv',stringsAsFactors=F)`
my_crime$Primary.Type <- tolower(my_crime$Primary.Type)

#Select variables of interest and subset by year and type of crime
#Note, yearReduce() not necessary at all: check R documentation before creating own functions
my_crime <- data.frame(year=my_crime$Year, community=my_crime$Community.Area, 
         type=my_crime$Primary.Type, arrest=my_crime$Arrest, 
         latitude=my_crime$Latitude, longitude=my_crime$Longitude)
vc <- subset(my_crime, year==2010, type=="homicide")

#Keep only complete cases
vc <- vc[complete.cases(vc), ]

#Load census tract data
#Note: function `shapefile` is a neater than `readOGR()`
#Note: The usage of `@` to access attribute data tables associated with spatial objects in R
tract <- shapefile('spat_aggreg/census_blocks_2000/Census_Blocks.shp')
tract <- spTransform(x=tract, CRSobj=CRS("+proj=longlat +datum=WGS84"))
names(tract@data) <- tolower(names(tract@data))

And now, to answer your question...

#Convert crime data to a spatial points object
vc <- SpatialPointsDataFrame(coords=vc[, c("longitude", "latitude")],
          data=vc[, c("year", "community", "type", "arrest")],
          proj4string=CRS("+proj=longlat +datum=WGS84"))

#Each row entry represents one homicide, so, add count column
vc@data$count <- 1

#Spatial overlay to identify census polygon in which each crime point falls
#The Result `vc_tract` is a dataframe with the tract data for each point
vc_tract <- over(x=vc, y=tract)

#Add tract data to crimePoints
vc@data <- data.frame(vc@data, vc_tract)

#Aggregate homicides by tract (assuming the column "census_tra" is the unique ID for census tracts)
hom_tract <- aggregate(formula=count~census_tra, data=vc@data, FUN=length)

#Add number of homicides to tracts object
m <- match(x=tract@data$census_tra, table=hom_tract$census_tra)
tract@data$homs_2010 <- hom_tract$count[m]

Now, your census tracts (in the spatialPolygonDataframe object named tract) contain a column named homs_2010 that has the number of homicides for each tract. From there, it should be a breeze to map it.

shekeine
  • 1,445
  • 10
  • 22
  • Thank you. Your answer does a great deal to both reach my intended goal and to increase my R/GIS coding knowledge. While I have not yet been able to get the data to map (haven't worked with the raster package before), this solution does what I require. – Patrick Williams Apr 27 '15 at 20:53