I've been struggling with trying to find the best way of solving this problem.
To generalize this problem and help others who may find themselves needing to perform similar tasks, I am trying to find the best way to add columns to one data set from third, that is based on matching in an intermediary data set, AND belonging in a date range of a third dataset. The end result would be returning matching values from the third data set into the first.
Here are the heads of sample data frames to add a bit of clarity:
> head(SalesData, 10)
sale_id sale_amt int_rate sale_date sale_status
1 1 7000 10.71 2008-05-01 Fully Paid
2 2 10800 13.57 2009-11-01 Fully Paid
3 3 7500 10.08 2008-04-01 Fully Paid
4 4 3000 14.26 2009-09-01 Fully Paid
5 5 5600 14.96 2010-02-01 Charged Off
6 6 2800 11.49 2010-08-01 Fully Paid
7 7 10000 8.59 2009-10-01 Fully Paid
8 8 18000 10.39 2008-03-01 Fully Paid
9 9 5000 15.13 2008-04-01 Fully Paid
10 10 9600 12.29 2008-03-01 Fully Paid
> head(EmployeeSales, 10)
sale_id empl_name empl_num
1 1 Dakota 4
2 2 Dakota 4
3 3 Kami 9
4 4 Adel 1
5 5 Adel 1
6 6 Farah 6
7 7 Kami 9
8 8 Kami 9
9 9 Ida 7
10 10 Kami 9
> head(EmployeeMap, 10)
empl_num empl_name skill_lvl team start_date end_date
1 1 Adel Beg Red 2007-06-01 2008-05-31
2 1 Adel Int Red 2008-06-01 2010-10-31
3 1 Adel Adv Red 2010-11-01 2999-12-12
4 2 Bailey Beg Blue 2010-08-01 2011-04-30
5 2 Bailey Beg Red 2011-05-01 2999-12-12
6 3 Casey Beg Blue 2010-08-01 2010-12-31
7 3 Casey Int Blue 2011-01-01 2999-12-12
8 4 Dakota Beg Red 2007-06-01 2009-08-30
9 4 Dakota Int Red 2009-09-01 2010-08-30
10 4 Dakota Adv Red 2010-09-01 2011-08-30
The desired output would add the empl_num, sales_team and skill_level from EmployeeMap to the SalesData for every sale_id.
In trying to conceptualize the steps, this is what I am thinking, but perhaps there is a better way: Take the sale_id from SalesData, match it to the sale_id in Employee Sales and get the empl_num. Take the empl_num and match it to empl_num in Employee Map. Now we need to take the sale_date from SalesData and find which range of "start_date, end_date" it falls into. Then we would take the team and skill level that matches, and add that to the SalesData.
See table below:
> head(df2,10)
sale_id sale_amt int_rate sale_date sale_status empl_num team skill_lvl
1 1 7000 10.71 2008-05-01 Fully Paid 4 Red Beg
2 2 10800 13.57 2009-11-01 Fully Paid 4 Red Int
3 3 7500 10.08 2008-04-01 Fully Paid 9 Blue Beg
4 4 3000 14.26 2009-09-01 Fully Paid 1 Red Int
5 5 5600 14.96 2010-02-01 Charged Off 1 Red Int
6 6 2800 11.49 2010-08-01 Fully Paid 6 Red Beg
7 7 10000 8.59 2009-10-01 Fully Paid 9 Blue Int
8 8 18000 10.39 2008-03-01 Fully Paid 9 Blue Beg
9 9 5000 15.13 2008-04-01 Fully Paid 7 Blue Beg
10 10 9600 12.29 2008-03-01 Fully Paid 9 Blue Int
What is complicating this for me is that in the EmployeeMap, the start_date and end_date tell us the date that each employee started and ended belonging to a particular skill level and team. But each employee has changed skill level and/or team so every employee has multiple rows.
For example, in EmployeeMap for empl_id 1, we can see 3 rows telling us their start_date and end_date while they had a skill_level Beg, Int, Adv all on Red Team. But some, like empl_id 2 change team while staying in the same skill level. And others change skill level and team.
I would appreciate any insight you may have into the best way to solve this problem.