1

I am using a crime data set from kaggle and I am trying to create a simple barplot which plots the crime type on x axis and how frequency the crime was committed on y axis. Although because there are a lot of crime types, the x axis is very messy. I did some research on how to make the labels vertical although they still are unreadable.

Are there any other ways I can clean this plot up without editing all of the crime names? My current code is:

tab2 <- table(crimeData$Crime.Code.Description)
barplot(tab2,main="Crime Areas",las=2)

enter image description here

d.b
  • 32,245
  • 6
  • 36
  • 77
Craig P H
  • 121
  • 2
  • 16
  • 2
    This looks like a classic case of [overplotting](https://www.displayr.com/what-is-overplotting/). There are a couple of ways to deal with this -either group your data into related topics (e.g. speeding, failure to stop, etc all fall under group of traffic), then break up into smaller sub-plots. Or you can plot only those that are significant for your point(set a cut-off point, maybe 25000 in your case). Check out ggplot2 - it can help you a lot with axis labels. – bob1 Nov 30 '18 at 17:01
  • cheers for the reply. how would i set a cut off point of 25000 for example? – Craig P H Nov 30 '18 at 17:04
  • Something like `tab2 <- table(crimeData$Crime.Code.Description >=25000)` should work – bob1 Nov 30 '18 at 17:07
  • thanks again for the reply. i tried this and got the following error message: Warning message: In Ops.factor(crimeData$Crime.Code.Description, 25000) : ‘>=’ not meaningful for factors – Craig P H Nov 30 '18 at 17:14
  • It's a little hard to trouble shoot without a minimal data set, but factors in R are categorical integers associated as a character - check [here](https://stackoverflow.com/questions/3418128/how-to-convert-a-factor-to-integer-numeric-without-loss-of-information) for a solution. – bob1 Nov 30 '18 at 17:20
  • i'm using this data set: https://www.kaggle.com/cityofLA/crime-in-los-angeles – Craig P H Nov 30 '18 at 17:27
  • In that case you can convert the code description into counts as you did for tab2: `tab2 <-table(crimeData$Crime.Code.Description)` then `tab3 <- subset(tab2, tab2 >=25000)`, then barplot as above using `tab3`. – bob1 Nov 30 '18 at 18:03
  • thanks bob, worked perfect! last question; would it be possible for me to rename the values of tab3? as the labels are still very long and need to be narrowed down to fit on x axis i.e. 'VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS) 0114' need to be narrowed down to 'VANDALISM' – Craig P H Nov 30 '18 at 18:11
  • look at the `cex.names` function in `?barplot`. You should be able to provide a character vector to do this. Another option is possibly to use the `substr` command to shorten and use a regular expression to capture only the first word, or perhaps break the name over 1 or more columns, so that the first word from the name is all that is left in one column. – bob1 Nov 30 '18 at 18:21

0 Answers0