0

I am exploring a dataset for accidents in UK between 2005 and 2015. I converted the column in a datetime format, removed some columns and created new. Anyway, I clean further my dataset and leave only the needed columns (to simplify the example):

                    date        
Accident_Index                                  
200501BS00001   2005-04-01
200501BS00002   2005-05-01  
200501BS00003   2005-06-01  
200501BS00004   2005-07-01  
200501BS00005   2005-10-01  

I tried to plot a line chart for the accidents for all the months during the years:

acc_by_year_and_month = acc_data["date"].groupby([acc_data.date.dt.year, acc_data.date.dt.month]).agg("count")
acc_by_year_and_month.plot(kind='line', figsize = (8,6))

plt.ylabel("Number of accidents")
plt.xlabel("Year and Month")
plt.title("Number of accidents by year")
plt.show()

Unfortunately, this shows only 4-5 year-month combination on the X-axis and it is not easy to be explored where the peaks are and where the min values for every year.

I tried also to create an interactive chart importing:

%matplotlib notebook
import matlotlib.pyplot as plt

However, then when moving the pointer of the mouse over the chart I indeed get x and y values, but they are the same and the year-month combination is not shown, so this option did not help me too.

I expect to get either an interactive line chart where I can move the pointer of the mouse and this will show me the x and y values (x=2005-1, y=17487). OR: I think this will be the easier option: I want to print the minimum values for accidents for all the years:

2005 - 2 - 14383 (In Feb 2005 there were 14383 accidents which is the min value for 2005). 
2006 - 2 - 13818 (In Feb 2006 there were 13818 accidents which is the min value for 2006)
..
and so on till year 2015.

If I print the variable acc_by_year_and_month I get something very close to the desired print. Then I get:

2005 - 1 - 17487
     - 2 - 14383
...
2006 - 1 - 16026
...

So I have to find the min value for each year and print it out.

Nick
  • 67
  • 2
  • 9
  • You can try changing your date to the index, resemble the data by month and choose the minimum value for the resampling dates. If you post a more complete data set I can give it a try. – Gustavo Gradvohl Jul 20 '19 at 17:11
  • this is similar: https://stackoverflow.com/questions/24082784/pandas-dataframe-groupby-datetime-month – AirSquid Jul 20 '19 at 17:34
  • @JeffH Yep, I checked this and tried out some of the things there, but did not bring me to the desired solution. I am trying to get the minimum value for each year and as information to get the number of the month: ``` year - month - min value year+1 - month - min value ... ``` @GustavoGradvohl The columns above are the only needed for this operation so thats why i excluded all other columns in the example, just to make it easier for understanding. – Nick Jul 20 '19 at 18:25

0 Answers0