I have the following pandas data frame with a date
column. How can I add a column specifying which half year the date belongs to?
Asked
Active
Viewed 1,660 times
2

dInGd0nG
- 4,162
- 1
- 24
- 37
-
Please never post screenshots of data, always post the actual data \[[How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)\]. Because here **we need to know the dtype of your 'date' column: string, object, Python datetime, pandas pd.Timestamp...** – smci Jan 17 '22 at 09:13
4 Answers
2
Convert dates to datetimes and then use numpy.where
with compare for less or equal:
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df['half year'] = np.where(df['date'].dt.month.le(6), 'H1', 'H2')
print (df)
date half year
0 1993-09-09 H2
1 1993-09-11 H2
2 1994-01-23 H1
3 1993-03-18 H1
Solution without numpy
with change mask for greater like 6
, add 1
and convert to strings:
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df['half year'] = 'H' + df['date'].dt.month.gt(6).add(1).astype(str)
print (df)
date half year
0 1993-09-09 H2
1 1993-09-11 H2
2 1994-01-23 H1
3 1993-03-18 H1

jezrael
- 822,522
- 95
- 1,334
- 1,252
1
Try:
df['half year'] = 'H' + pd.to_datetime(df['date']).dt.month.floordiv(6).add(1).astype(str)
print(df)
# Output
date half year
0 09-09-1993 H2
1 18-03-1993 H1

Corralien
- 109,409
- 8
- 28
- 52
1
This solution uses .apply
method.
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({'date': ['09-09-1993', '11-09-1993', '23-01-1994', '18-03-1993']})
>>>
>>> df['date'] = pd.to_datetime(df['date'], format='%d-%m-%Y')
>>> df
date
0 1993-09-09
1 1993-09-11
2 1994-01-23
3 1993-03-18
>>>
>>> df['half year'] = df.date.dt.month.apply(lambda x: "H1" if x in range(0, 7) else "H2")
>>> df
date half year
0 1993-09-09 H2
1 1993-09-11 H2
2 1994-01-23 H1
3 1993-03-18 H1

Abdul Niyas P M
- 18,035
- 2
- 25
- 46
1
Without numpy
, a rather general solution that could easily be modified to get semesters as 'first'
or 'second'
:
- build a
pandas.DataFrame
(not in the provided code)
(Code starts from here)*
- convert the
'date'
column to datetime - extract month as a
pandas.Series
fromdf['date']
usingpandas.Series.dt.month
- integer divide by 6 to get the semesters as a
pandas.Series
ofint
(counting from 0) - get the semester strings as a
pandas.Series
usingpandas.Series.map
with the semesters ints as indices on a semester names list - bundling (steps 2. to 4.) with
pandas.Categorical
to get categorical column (lower space, faster processing), this step is optional
>>> df['date'] = pd.to_datetime(df['date'])
>>> df['half year'] = pd.Categorical((df['date'].dt.month // 6).map(lambda h:['H1','H2'][h]))
>>> df
date half year
0 1993-09-09 H2
1 1993-09-11 H2
2 1994-01-23 H1
3 1993-03-18 H1
>>> df.dtypes
date datetime64[ns]
half year category
dtype: object

hpchavaz
- 1,368
- 10
- 16