2

I have the following pandas data frame with a date column. How can I add a column specifying which half year the date belongs to?

enter image description here

dInGd0nG
  • 4,162
  • 1
  • 24
  • 37
  • Please never post screenshots of data, always post the actual data \[[How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)\]. Because here **we need to know the dtype of your 'date' column: string, object, Python datetime, pandas pd.Timestamp...** – smci Jan 17 '22 at 09:13

4 Answers4

2

Convert dates to datetimes and then use numpy.where with compare for less or equal:

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

df['half year'] = np.where(df['date'].dt.month.le(6), 'H1', 'H2')
print (df)
        date half year
0 1993-09-09        H2
1 1993-09-11        H2
2 1994-01-23        H1
3 1993-03-18        H1

Solution without numpy with change mask for greater like 6, add 1 and convert to strings:

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

df['half year'] = 'H' + df['date'].dt.month.gt(6).add(1).astype(str)
print (df)
        date half year
0 1993-09-09        H2
1 1993-09-11        H2
2 1994-01-23        H1
3 1993-03-18        H1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Try:

df['half year'] = 'H' + pd.to_datetime(df['date']).dt.month.floordiv(6).add(1).astype(str)
print(df)

# Output
         date half year
0  09-09-1993        H2
1  18-03-1993        H1
Corralien
  • 109,409
  • 8
  • 28
  • 52
1

This solution uses .apply method.

>>> import pandas as pd
>>>
>>> df = pd.DataFrame({'date': ['09-09-1993', '11-09-1993', '23-01-1994', '18-03-1993']})
>>>
>>> df['date'] = pd.to_datetime(df['date'], format='%d-%m-%Y')
>>> df
        date
0 1993-09-09
1 1993-09-11
2 1994-01-23
3 1993-03-18
>>>
>>> df['half year'] = df.date.dt.month.apply(lambda x: "H1" if x in range(0, 7) else "H2")
>>> df
        date half year
0 1993-09-09        H2
1 1993-09-11        H2
2 1994-01-23        H1
3 1993-03-18        H1
Abdul Niyas P M
  • 18,035
  • 2
  • 25
  • 46
1

Without numpy, a rather general solution that could easily be modified to get semesters as 'first' or 'second':

  1. build a pandas.DataFrame (not in the provided code)

(Code starts from here)*

  1. convert the 'date' column to datetime
  2. extract month as a pandas.Series from df['date'] using pandas.Series.dt.month
  3. integer divide by 6 to get the semesters as a pandas.Series of int (counting from 0)
  4. get the semester strings as a pandas.Series using pandas.Series.map with the semesters ints as indices on a semester names list
  5. bundling (steps 2. to 4.) with pandas.Categorical to get categorical column (lower space, faster processing), this step is optional
>>> df['date'] = pd.to_datetime(df['date'])
>>> df['half year'] = pd.Categorical((df['date'].dt.month // 6).map(lambda h:['H1','H2'][h]))
>>> df
        date half year
0 1993-09-09        H2
1 1993-09-11        H2
2 1994-01-23        H1
3 1993-03-18        H1
>>> df.dtypes
date         datetime64[ns]
half year          category
dtype: object


hpchavaz
  • 1,368
  • 10
  • 16