5

I have a dataframe which consists of a date column but the date column is in string. How can I check if the date lies in first half of the month or the second half and add add another column with the billing date

For e.g.

if the date is 08-10-2020(format is dd-mm-yyyy) then the billing date column will consist of 16th of the same month and if the date lies in between 17-31 it then the billing date will consist of 1st day of next month

Data:

print(df['dispatch_date'].head())

0    01-10-2020
1    07-10-2020
2    17-10-2020
3    16-10-2020
4    09-10-2020
Name: dispatch_date, dtype: object

example output:

                 billing date
0    01-10-2020  16-10-2020
1    07-10-2020  16-10-2020
2    17-10-2020  01-11-2020
3    16-10-2020  01-11-2020
4    09-10-2020  16-10-2020
Michael Szczesny
  • 4,911
  • 5
  • 15
  • 32
Rahul Sharma
  • 2,187
  • 6
  • 31
  • 76
  • 1
    Can you fix index 3, first its 15-10-2020, than 16-10-2020 and shouldnt that lie in the first half of the month? – Erfan Oct 10 '20 at 11:03
  • `dt.ceil('SM')` would be nice (https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html), but it doesn't seem to work. https://github.com/pandas-dev/pandas/issues/15303 – Eric Duminil Oct 10 '20 at 11:26

4 Answers4

2

You could do it using apply as follows-

import pandas as pd
import datetime as dt

dates = ['01-10-2020', '07-10-2020', '17-10-2020', '15-12-2020', '19-12-2020']
df = pd.DataFrame(data=dates, columns=['dates'])

# if the billing data can still be string going ahead
print(df.dates.apply(lambda x: '16'+x[2:] if int(x[:2]) < 16 else '01-'+str(int(x[3:5])+1)+x[5:] if int(x[3:5]) != 12 else '01-'+'01-'+str(int(x[6:])+1)))
df['billing_date'] = df.dates.apply(lambda x: '16'+x[2:] if int(x[:2]) < 16 else '01-'+str(int(x[3:5])+1)+x[5:] if int(x[3:5]) != 12 else '01-'+'01-'+str(int(x[6:])+1))

# if billing date series is needed as a datetime object
print(df.dates.apply(lambda x: dt.date(int(x[-4:]), int(x[3:5]), 16) if int(x[:2]) < 16 else dt.date(int(x[-4:]), int(x[3:5])+1, 1) if int(x[3:5]) != 12 else dt.date(int(x[-4:])+1, 1, 1)))
df['billing_date'] = df.dates.apply(lambda x: dt.date(int(x[-4:]), int(x[3:5]), 16) if int(x[:2]) < 16 else dt.date(int(x[-4:]), int(x[3:5])+1, 1) if int(x[3:5]) != 12 else dt.date(int(x[-4:])+1, 1, 1))
Output
0    16-10-2020
1    16-10-2020
2    01-11-2020
3    16-12-2020
4    01-01-2021
Name: dates, dtype: object

0    2020-10-16
1    2020-10-16
2    2020-11-01
3    2020-12-16
4    2021-01-01
Name: dates, dtype: object

Edit: code handles the edge case scenario possible in december

sai
  • 1,734
  • 1
  • 7
  • 13
1

Pure pandas, no string or apply modifications needed:

  • locate all dates with days before the 16th and set them to the 1st of monthst of that month plus semi month end (wich is the 15th)
  • locate all other dates dats on or after 16th and move it to the begin of next month

We need two different df.loc's because

df['billday'] = df['date'].dt.floor('d') + pd.offsets.SemiMonthEnd()

would move the dates to 15th and the (28th to 31th depending on month and leap year) of that month unfortunately.


import pandas as pd
from datetime import date 

# create demo data    
dt = [date(2020,10,i) for i in range(1,32,3)]    
df = pd.DataFrame({"date": dt})
df["date"] = pd.to_datetime(df["date"])
print(df)


# create billday colums     
df.loc[df.date.dt.day < 16, "billday"] = df['date'].dt.floor('d') +\
                                             pd.offsets.SemiMonthEnd() # to 15th
df.loc[df.date.dt.day > 15, "billday"] = df['date'].dt.floor('d') +\
                                             pd.offsets.MonthBegin(1)
 
print(df) 

Output:

         date    billday
0  2020-10-01 2020-10-15
1  2020-10-04 2020-10-15
2  2020-10-07 2020-10-15
3  2020-10-10 2020-10-15
4  2020-10-13 2020-10-15
5  2020-10-16 2020-11-01
6  2020-10-19 2020-11-01
7  2020-10-22 2020-11-01
8  2020-10-25 2020-11-01
9  2020-10-28 2020-11-01
10 2020-10-31 2020-11-01

See:

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
0

You could do the following:

import pandas as pd

rng = pd.date_range('2015-02-11', periods=10, freq='D')
df = pd.DataFrame({'Date': rng})

df['Billing'] = df["Date"].map(lambda ts: pd.to_datetime('{:}-{:}-{:}'.format(ts.year, ts.month if ts.day < 16 else ts.month + 1, 16 if ts.day < 16 else 1)))

print(df)

So the end result would be:

        Date    Billing
0 2015-02-11 2015-02-16
1 2015-02-12 2015-02-16
2 2015-02-13 2015-02-16
3 2015-02-14 2015-02-16
4 2015-02-15 2015-02-16
5 2015-02-16 2015-03-01
6 2015-02-17 2015-03-01
7 2015-02-18 2015-03-01
8 2015-02-19 2015-03-01
9 2015-02-20 2015-03-01
David
  • 8,113
  • 2
  • 17
  • 36
0

Try this:

def bill(x):
    res=''
    d, m, y=x[:2], x[3:5], x[6:]
    if m!='12':
        resd, resm, resy =('16', m, y) if int(d) in range(1,16) else ('01', str(int(m)+1), y)
    else:
        resd, resm, resy =('16', m, y) if int(d) in range(1,16) else ('01', '01', str(int(y)+1))
    if len(m)==1:
        resm='0'+ resm
    return resd + '-' + resm + '-' + resy

df['billing_date']=df['dispatch_date'].apply(bill)

Result:

  dispatch_date billing_date
0    01-10-2020  16-10-2020
1    07-10-2020  16-10-2020
2    17-10-2020  01-11-2020
3    16-10-2020  01-11-2020
4    09-12-2020  01-01-2021
IoaTzimas
  • 10,538
  • 2
  • 13
  • 30