0

GitHub Link to the notebook

I am currently working on a project where I'm analyzing hate crimes trends in Austin, TX. Presently, I have a problem with my data. With the 'incident_number' column, I want to split it into two...the numbers before the '-' clearly indicate the year, which I'd like to merge into the 'month' column. The numbers after the '-' I want to keep within the 'incident_number' column.

Anyone know how I can achieve this?

Originally I tried:

aus_final['incident_number'] = pd.to_datetime(aus_final['incident_number'], format='%d%m%Y')

which produced an error:

ValueError: time data '2017-241137' does not match format '%d%m%Y' (match)

I kinda knew that was going to happen but I had to try anyway. :P Needless to say, I'm very much a novice still with Python. Any help is greatly appreciated.

  • Welcome to Stackoverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Jun 11 '20 at 07:21
  • Please note that the link to the Jupiter notebook doesn't work. In any case, it would be best to include the relevant data in the question itself. In addition, please explain what's the expected output. – Roy2012 Jun 11 '20 at 07:23
  • The issue exactly is what the error says `ValueError: time data '2017-241137' does not match format '%d%m%Y' (match)` that your data does not match the format you provided, wchich expects `24112017(%d%m%Y)` – Yati Raj Jun 11 '20 at 07:24
  • agreed, the problem is I'm unsure exactly how I could overcome this. By reformatting the data? I'm simply unsure.... – Robert Grantham Jun 11 '20 at 13:08

1 Answers1

0

Link to the referenced notebook

It took a few tries but I finally got it right. It was a matter of trial-and-error honestly. I read several of the question forums on stackoverflow, related to pandas and how to structure, format columns etc. for ex. splitting columns, handling categorical data & another aid on categorical data to name a few. I ended up hitting the jackpot with the following code:

new = aus_final["incident_number"].str.split("-", n = 1, expand = True)
aus_final["year"]= new[0]
aus_final["occurence_number"]= new[1]
aus_final.drop(columns =["incident_number"], inplace = True)
aus_final['date'] = aus_final[['month', 'year']].agg('-'.join, axis=1)
aus_final.drop(['month', 'occurence_number', 'year'], axis=1, inplace=True)
aus_final = aus_final[['date', 'bias', 'number_of_victims_over_18', 'offense_location']]
aus_final.rename(columns={'number_of_victims_over_18': 'victims'}, inplace=True)
aus_final['date'] = pd.to_datetime(aus_final['date'])
aus_final.set_index('date', inplace=True)

I may be a slow learner, but I certainly retain everything once I try it out a few times for myself. :) Thanks for steering me in the right direction guys!