I have this pandas dataframe with a datetime, an integer, and a string column
from io import StringIO
import pandas as pd
data1 = """Year N X
2008-01-01 2 A
2008-01-01 3 B
2008-01-01 6 C
2008-01-01 2 D
2010-01-01 7 A
2010-01-01 1 B
2010-01-01 8 C
2012-01-01 9 A
2012-01-01 4 B
2016-01-01 1 A"""
df = pd.read_csv(StringIO(data1), delim_whitespace=True, parse_dates=["Year"])
I can aggregate column N
for count, min, and max simply as:
df1 = df.groupby("X")["N"].agg(Count="count", Min="min", Max="max").reset_index()
print(df1)
X Count Min Max
0 A 4 1 9
1 B 3 1 4
2 C 2 6 8
3 D 1 2 2
Is there a way to achieve the same for column Year
only displaying the year?
I can do this in several steps:
g = df.groupby("X")["Year"]
df2 = g.agg(Count= "count").reset_index()
df2["Start_date"] = g.min().dt.year.values
df2["End_date"] = g.max().dt.year.values
print(df2)
X Count Start_date End_date
0 A 4 2008 2016
1 B 3 2008 2012
2 C 2 2008 2010
3 D 1 2008 2008
But a version similar to that above for N
such as
df2 = df.groupby("X")["Year"].agg(Count="count", Min="min().dt.year.values", Max="max().dt.year.values").reset_index()
obviously does not work. Is there a simpler way to aggregate the first and last year in pandas groupby (apart from the obvious approach to first extract min/max dates as above, then convert the datetime columns into year columns)?