3

I'm working on the Chicago crimes dataset and I created a dataframe called primary which is just the type of crime. Then I grouped by the type of crime and got its count. This is the relevant code.

primary = crimes2012[['Primary Type']].copy()
test=primary.groupby('PrimaryType').size().sort_values().reset_index(name='Count')

Now I have a dataframe 'test' which has the crimes and their count. What I want to do it merge together certain crimes. For example, "Non-Criminal" and "Non - Criminal" and "Non-Criminal(Subject Specified)". But because they're rows now I don't know how to do it. I was trying to use .loc[]

I also tried using

test['Primary Type'=='NON-CRIMINAL'] = test['Primary Type'=='NON - CRIMINAL']+test['Primary Type'=='NON-CRIMINAL']+test['Primary Type'=='NON-CRIMINAL (SUBJECT SPECIFIED)']

but of course that only returned a Boolean value of false

nekomatic
  • 5,988
  • 1
  • 20
  • 27
Mustafa Moiz
  • 53
  • 1
  • 6
  • 1
    Can you give a small sample date set and the expected outcome – Maarten Fabré Apr 07 '19 at 20:16
  • 1
    The answer provided by Mortz is perfect, but given most datasets for analyses/datascience are huge, also consider the Numpy approach provided by B.M. in : https://stackoverflow.com/questions/22219004/grouping-rows-in-list-in-pandas-groupby – Snehaa Ganesan Apr 15 '19 at 14:04

1 Answers1

1

You can look at map or apply here - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html

You will have to create a mapping of your inputs to desired outputs as a dictionary desired_output = {"NON CRIMINAL": "NON-CRIMINAL", "NC": "NON-CRIMINAL", ...}

and apply/map it to your primary series as follows -

primary = primary.map(desired_output)

And then groupby as you are doing now

Mortz
  • 4,654
  • 1
  • 19
  • 35