2

Hi I have a number of similar datasets with different category types. I have decided which categories from certain datasets will be grouped together to put into 1 unified category. I am looking for a way to group these different rows into one unified row.

For example, in dataset 1, it has a crime type called 'Criminal damage'.

In my dataset 2, I would like the rows that has have 'Malicious damage to property' and 'Arson' in the column titled Category to be grouped together in a new row with information the same(Province, Station) but the Category column value is 'Criminal Damage' and the values under the years X2008 X2009 etc are added together, in order to be aligned with my other dataset. The same with: Burglary at resident premises and burglary at non-residential premises should be grouped as 'Burglary'. Also with: Driving under the influence of alcohol or drugs and illegal possession of firearms and ammunition should be grouped as 'Other notable offences'.

Here is an example of the data needed to be transformed:

Province      Station Category           X2008 X2009 X2010 X2011 X2012 X2013
Kwazulu/Natal Phoenix Malicious damage to property 778 883 767 754 804 785
Kwazulu/Natal Phoenix Arson                        16  24  13  26  12  9

North West    Phokeng Malicious damage to property 243 235 154 187 153 156
North West    Phokeng Arson                        12  13  12  9   6   6

Eastern Cape Adelaide Burglary at residential premises     103 93 89 73 75 53
Eastern Cape Adelaide Burglary at non-residential premises 20 13 22 20 19 26

North West   Mogwase  Driving under the influence of alcohol or drugs 37 36 31 38 24 18
North West   Mogwase  Illegal possession of firearms and ammunition   18 8 12 12 12 13

To be turned into something like this:

Province      Station Category           X2008 X2009 X2010 X2011 X2012 X2013
Kwazulu/Natal Phoenix Criminal damage    794   897   780   780   816   794

North West    Phokeng Criminal damage    255   248   166   196   159   162

Eastern Cape Adelaide Burglary           123   106   111   93    94    79

North West   Mogwase  Other notable offences   55  44  43  50  36  31

Any help would be so appreciated! I am so stuck on how to go about doing this. Thanks!

kath
  • 7,624
  • 17
  • 32
DylanHig
  • 29
  • 4

2 Answers2

2

We can use group_by with Category as "Criminal damage"`

library(dplyr)
df1 %>%
  group_by(Province, Station, Category = "Criminal damage") %>% 
  summarise_all(sum)
#A tibble: 2 x 9
# Groups: Province, Station [?]
#  Province      Station Category        X2008 X2009 X2010 X2011 X2012 X2013
#  <chr>         <chr>   <chr>           <int> <int> <int> <int> <int> <int>
#1 Kwazulu/Natal Phoenix Criminal damage   794   907   780   780   816   794
#2 North West    Phokeng Criminal damage   255   248   166   196   159   162
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you so much for your answer akrun! That answer was 100% correct for the question I asked. However, I should have been more clear! The same dataset contains more categories of crime that need to be grouped under other headings also. I have updated the data/question above with the relevant info. Have you any idea how to do this? Thanks! – DylanHig Mar 28 '18 at 15:33
  • @DylanHig With the updated data, it is not clear about the expected – akrun Mar 28 '18 at 18:19
0

One way (although probably not the most efficient) you could do this is by renaming each category to match your new combined categories (see this answer), then group using group_by() as akrun said.

df1$Category[df1$Category == "Burglary at residential premises"] <- "Burglary"
df1$Category[df1$Category == "Burglary at non-residential premises"] <- "Burglary"

(and so on)

Then use dplyr's summarise() to find the sum of all crimes for each group of province, station and category.

Lucy Wheeler
  • 271
  • 3
  • 17