I'm having an odd issue using FBI crime data. There are some cities/towns that have the same name in the same state, so county is given as a way to separate these values. For the years 2003-2017 there are roughly 1700 values that also have counties. However, when I try to join this dataset with another dataset, or even filter by a county (for instance, COUNTY == "york county") I'll only get six values/rows, when I should be getting 48. I've made them all lowercase and have tried trimming (if there were whitespace) and have run as.character(), but I still get the same behavior. It's weird that it's returning a handful of values, but not all of them. Any ideas?
If I try running
data%>%filter(COUNTY=="adams county")
it will only return two values: conewago and cumberland.
I used the following code to cut those data values with a county from those without a county (in which case there will be an NA. Then I make sure the white space is removed.
crime.06_17.slice <- crime.06_17%>%arrange(COUNTY)%>%slice(1:1758)
crime.06_17.slice$COUNTY <- trimws(crime.06_17.slice$COUNTY, which = c("both"), whitespace = "[\t\r\n]")
structure(list(CITY = c("washington", "conewago", "conewago",
"cumberland", "conewago", "cumberland", "liberty", "conewago",
"liberty", "conewago", "cumberland", "liberty", "conewago", "cumberland",
"liberty", "conewago", "cumberland", "liberty", "conewago", "cumberland",
"conewago", "cumberland", "conewago", "cumberland", "conewago",
"cumberland", "conewago", "cumberland", "liberty", "cumberland"
), COUNTY = c(" mercer county", " adams county", " adams county",
" adams county", " adams county", " adams county", " adams county",
" adams county", " adams county", " adams county", " adams county",
" adams county", " adams county", " adams county", " adams county",
" adams county", " adams county", " adams county", " adams county",
" adams county", " adams county", " adams county", " adams county",
" adams county", " adams county", " adams county", " adams county",
" adams county", " adams county", " adams township"), CRIME_VIOLENT = c(8,
6, 4, 4, 3, 1, 0, 3, 1, 3, 2, 2, 1, 1, 1, 8, 3, 0, 6, 3, 3, 2,
4, 3, 5, 5, 5, 5, 0, 1), CRIME_PROPERTY = c(125, 64, 92, 35,
98, 47, 4, 125, 29, 113, 43, 24, 90, 55, 15, 84, 66, 20, 89,
52, 48, 49, 54, 53, 48, 38, 30, 41, 11, 23), CRIME_TOTAL = c(133,
70, 96, 39, 101, 48, 4, 128, 30, 116, 45, 26, 91, 56, 16, 92,
69, 20, 95, 55, 51, 51, 58, 56, 53, 43, 35, 46, 11, 24), year = c(2005,
2006, 2007, 2007, 2008, 2008, 2008, 2009, 2009, 2010, 2010, 2010,
2011, 2011, 2011, 2012, 2012, 2012, 2013, 2013, 2014, 2014, 2015,
2015, 2016, 2016, 2017, 2017, 2017, 2009), STATE = c("new jersey",
"pennsylvania", "pennsylvania", "pennsylvania", "pennsylvania",
"pennsylvania", "pennsylvania", "pennsylvania", "pennsylvania",
"pennsylvania", "pennsylvania", "pennsylvania", "pennsylvania",
"pennsylvania", "pennsylvania", "pennsylvania", "pennsylvania",
"pennsylvania", "pennsylvania", "pennsylvania", "pennsylvania",
"pennsylvania", "pennsylvania", "pennsylvania", "pennsylvania",
"pennsylvania", "pennsylvania", "pennsylvania", "pennsylvania",
"pennsylvania")), row.names = c(NA, -30L), class = c("tbl_df",
"tbl", "data.frame"))