I have a dataframe with records spanning multiple years:
WarName | StartDate | EndDate
---------------------------------------------
'fakewar1' 01-01-1990 02-02-1995
'examplewar' 05-01-1990 03-07-1998
(...)
'examplewar2' 05-07-1999 06-09-2002
I am trying to convert this dataframe to a summary overview of the total wars per year, e.g.:
Year | Number_of_wars
----------------------------
1989 0
1990 2
1991 2
1992 3
1994 2
Usually I would use someting like df.groupby('year').count()
to get total wars by year, but since I am currently working with ranges instead of set dates that approach wouldn't work.
I am currently writing a function that generates a list of years, and then for each year in the list checks each row in the dataframe and runs a function that checks if the year is within the date-range of that row (returning True if that is the case).
years = range(1816, 2006)
year_dict = {}
for year in years:
for index, row in df.iterrows():
range = year_in_range(year, row)
if range = True:
year_dict[year] = year_dict.get(year, 0) + 1
This works, but is also seems extremely convoluted. So I was wondering, what am I missing? What would be the canonical 'pandas-way' to solve this issue?