Python_Pandas: If datetime values fall under certain date duration, create a column with specific value

Question

Given:

From below df,

df = pd.DataFrame(
            {"date":['2016-6-1', '2016-9-22', '2016-10-28', '2016-11-4', '2016-6-29', '2016-10-1', '2016-6-15', '2016-7-29', '2016-11-1'],
             "score":[9, 8, 8, 10, 6, 7, 7, 7, 6]
            })

Perform below task:

for dates meet below criteria, add certain value to newly added column called 'staffNumber':

IF 'date' falls under 6/1/2016~9/22/2016 THAN create a new column with the value of 1.

IF 'date' falls under 9/23/2016~10/28/2016 THAN create a new column with the value of 2.

IF 'date' falls under 10/29/2016~11/4/2016 THAN create a new column with the value of 3

End-result will look like this:

df2 = pd.DataFrame(
            {"date":['2016-6-1', '2016-9-22', '2016-10-28', '2016-11-4', '2016-6-29', '2016-10-1', '2016-6-15', '2016-7-29', '2016-11-1'],
             "score":[9, 8, 8, 10, 6, 7, 7, 7, 6],
             "staffNumber":[1,1,2,3,1,2,1,1,3]
            })

What I've tried:

I usually try something before I ask any question. However, for this one I couldn't think of any approach.

I looked at using np.where & .isin from following links: 1. Python numpy where function with datetime 2. Using 'isin' on a date in a pandas column 3. Pandas conditional creation of a series/dataframe column

Any help will be appreciated!

jezrael · Accepted Answer · 2017-11-13T06:32:50.950

Use cut:

#convert to datetimes if necessary
df['date'] = pd.to_datetime(df['date'])
b = pd.to_datetime(['2016-06-01','2016-09-22','2016-10-28','2016-11-04'])
l = range(1,4)
df['new'] = pd.cut(df['date'], bins=b, labels=l, include_lowest=True)
print (df)
        date  score new
0 2016-06-01      9   1
1 2016-09-22      8   1
2 2016-10-28      8   2
3 2016-11-04     10   3
4 2016-06-29      6   1
5 2016-10-01      7   2
6 2016-06-15      7   1
7 2016-07-29      7   1
8 2016-11-01      6   3

Or numpy.searchsorted:

#change first date to 2016-05-31
b = pd.to_datetime(['2016-05-31','2016-09-22','2016-10-28','2016-11-04'])
l = range(1,4)

df['new'] = np.array(l)[b.searchsorted(df['date'].values) - 1]
print (df)
        date  score  new
0 2016-06-01      9    1
1 2016-09-22      8    1
2 2016-10-28      8    2
3 2016-11-04     10    3
4 2016-06-29      6    1
5 2016-10-01      7    2
6 2016-06-15      7    1
7 2016-07-29      7    1
8 2016-11-01      6    3

It worked for the sample df. But for the problem I have to solve, I have 16 different date ranges, 6/1/2016~Now, and all those ranges have different values (1,2,1,0,1,0,1,0,-1,-2,-3,-4,-4,-3,-2,-3). First approach 'cut' method didn't work with the ValueError: Categorical categories must be unique. I'm trying the second method. Where did you apply the range(1,4) in the second method though? Another concern is my problem has negative number as well.. — Johnny, Nov 13 '17 at 03:41
I edit answer, only use indexing `l` by indices returned from `searchosrted` function, negative numbers working very nice too. — jezrael, Nov 13 '17 at 06:59

score 0 · Answer 2 · answered Nov 10 '17 at 08:24

In general, accomplish this you need to create a column regardless of the value of the date.

df['employee'] = ...some_value_here...

Then you need to assign the value when the date is inside the ranges you specify. You can do it with a lambda:

df['employee'] = df['date'].apply( lambda x : __something__ )

Now you have replace the __something__ inside the lambda with the logic that assigns that date ranges (which are strings!) into the values you need.

If that __something__ inside the lambda is quite long it won't be readable: define a function that does it before and apply(lambda x: justdefinedfunction(x) )

score 0 · Answer 3 · answered Jan 08 '20 at 21:12

This question seems to be a bit old, but I had a similar need recently and here's how I made it work:

def staffNumber(date):
    if datetime.date(2016, 1, 6) <= date <= datetime.date(2016, 9, 22):
        return 1
    elif datetime.date(2016, 9, 23) <= date <= datetime.date(2016, 10, 28):
        return 2

    """#(include all the other IFs and date ranges here)"""

    else:
        return 'input date out of range'

df['staffNumber'] = df.date.apply(lambda x: fiscalweek(x) )

Python_Pandas: If datetime values fall under certain date duration, create a column with specific value

Given:

Perform below task:

End-result will look like this:

What I've tried:

3 Answers3