4

Given:

From below df,

df = pd.DataFrame(
            {"date":['2016-6-1', '2016-9-22', '2016-10-28', '2016-11-4', '2016-6-29', '2016-10-1', '2016-6-15', '2016-7-29', '2016-11-1'],
             "score":[9, 8, 8, 10, 6, 7, 7, 7, 6]
            })

Perform below task:

for dates meet below criteria, add certain value to newly added column called 'staffNumber':

IF 'date' falls under 6/1/2016~9/22/2016 THAN create a new column with the value of 1.

IF 'date' falls under 9/23/2016~10/28/2016 THAN create a new column with the value of 2.

IF 'date' falls under 10/29/2016~11/4/2016 THAN create a new column with the value of 3

End-result will look like this:

df2 = pd.DataFrame(
            {"date":['2016-6-1', '2016-9-22', '2016-10-28', '2016-11-4', '2016-6-29', '2016-10-1', '2016-6-15', '2016-7-29', '2016-11-1'],
             "score":[9, 8, 8, 10, 6, 7, 7, 7, 6],
             "staffNumber":[1,1,2,3,1,2,1,1,3]
            })

end result

What I've tried:

I usually try something before I ask any question. However, for this one I couldn't think of any approach.

I looked at using np.where & .isin from following links: 1. Python numpy where function with datetime 2. Using 'isin' on a date in a pandas column 3. Pandas conditional creation of a series/dataframe column

Any help will be appreciated!

Johnny
  • 181
  • 1
  • 9

3 Answers3

4

Use cut:

#convert to datetimes if necessary
df['date'] = pd.to_datetime(df['date'])
b = pd.to_datetime(['2016-06-01','2016-09-22','2016-10-28','2016-11-04'])
l = range(1,4)
df['new'] = pd.cut(df['date'], bins=b, labels=l, include_lowest=True)
print (df)
        date  score new
0 2016-06-01      9   1
1 2016-09-22      8   1
2 2016-10-28      8   2
3 2016-11-04     10   3
4 2016-06-29      6   1
5 2016-10-01      7   2
6 2016-06-15      7   1
7 2016-07-29      7   1
8 2016-11-01      6   3

Or numpy.searchsorted:

#change first date to 2016-05-31
b = pd.to_datetime(['2016-05-31','2016-09-22','2016-10-28','2016-11-04'])
l = range(1,4)

df['new'] = np.array(l)[b.searchsorted(df['date'].values) - 1]
print (df)
        date  score  new
0 2016-06-01      9    1
1 2016-09-22      8    1
2 2016-10-28      8    2
3 2016-11-04     10    3
4 2016-06-29      6    1
5 2016-10-01      7    2
6 2016-06-15      7    1
7 2016-07-29      7    1
8 2016-11-01      6    3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • It worked for the sample df. But for the problem I have to solve, I have 16 different date ranges, 6/1/2016~Now, and all those ranges have different values (1,2,1,0,1,0,1,0,-1,-2,-3,-4,-4,-3,-2,-3). First approach 'cut' method didn't work with the ValueError: Categorical categories must be unique. I'm trying the second method. Where did you apply the range(1,4) in the second method though? Another concern is my problem has negative number as well.. – Johnny Nov 13 '17 at 03:41
  • I edit answer, only use indexing `l` by indices returned from `searchosrted` function, negative numbers working very nice too. – jezrael Nov 13 '17 at 06:59
0

In general, accomplish this you need to create a column regardless of the value of the date.

df['employee'] = ...some_value_here...

Then you need to assign the value when the date is inside the ranges you specify. You can do it with a lambda:

df['employee'] = df['date'].apply( lambda x : __something__ )

Now you have replace the __something__ inside the lambda with the logic that assigns that date ranges (which are strings!) into the values you need.

If that __something__ inside the lambda is quite long it won't be readable: define a function that does it before and apply(lambda x: justdefinedfunction(x) )

Iñigo González
  • 3,735
  • 1
  • 11
  • 27
0

This question seems to be a bit old, but I had a similar need recently and here's how I made it work:

def staffNumber(date):
    if datetime.date(2016, 1, 6) <= date <= datetime.date(2016, 9, 22):
        return 1
    elif datetime.date(2016, 9, 23) <= date <= datetime.date(2016, 10, 28):
        return 2

    """#(include all the other IFs and date ranges here)"""

    else:
        return 'input date out of range'

df['staffNumber'] = df.date.apply(lambda x: fiscalweek(x) )