I'm in a bit of a pickle. I've been working on a problem all day without seeing any real results. I'm working in Python and using Pandas for handling data.
What I'm trying to achieve is based on the customers previous interactions to sum each type of interaction. The timestamp of the interaction should be less than the timestamp of the survey. Ideally, I would like to sum the interactions for the customer during some period - like less than e.g. 5 years.
The first dataframe contains a customer ID, segmentation of that customer during in that survey e.g. 1 being "happy", 2 being "sad" and a timestamp for the time of the recorded segment or time of that survey.
import pandas as pd
#Generic example
customers = pd.DataFrame({"customerID":[1,1,1,2,2,3,4,4],"customerSeg":[1,2,2,1,2,3,3,3],"timestamp":['1999-01-01','2000-01-01','2000-06-01','2001-01-01','2003-01-01','1999-01-01','2005-01-01','2008-01-01']})
customers
Which yields something like:
customerID | customerSeg | timestamp |
---|---|---|
1 | 1 | 1999-01-01 |
1 | 1 | 2000-01-01 |
1 | 1 | 2000-06-01 |
2 | 2 | 2001-01-01 |
2 | 2 | 2003-01-01 |
3 | 3 | 1999-01-01 |
4 | 4 | 2005-01-01 |
4 | 4 | 2008-01-01 |
The other dataframe contains interactions with that customer eg. at service and a phonecall.
interactions = pd.DataFrame({"customerID":[1,1,1,1,2,2,2,2,4,4,4],"timestamp":['1999-07-01','1999-11-01','2000-03-01','2001-04-01','2000-12-01','2002-01-01','2004-03-01','2004-05-01','2000-01-01','2004-01-01','2009-01-01'],"service":[1,0,1,0,1,0,1,1,0,1,1],"phonecall":[0,1,1,1,1,1,0,1,1,0,1]})
interactions
Output:
customerID | timestamp | service | phonecall |
---|---|---|---|
1 | 1999-07-01 | 1 | 0 |
1 | 1999-11-01 | 0 | 1 |
1 | 2000-03-01 | 1 | 1 |
1 | 2001-04-01 | 0 | 1 |
2 | 2000-12-01 | 1 | 1 |
2 | 2002-01-01 | 0 | 1 |
2 | 2004-03-01 | 1 | 0 |
2 | 2004-05-01 | 1 | 1 |
4 | 2000-01-01 | 0 | 1 |
4 | 2004-01-01 | 1 | 0 |
4 | 2009-01-01 | 1 | 1 |
Result for all previous interactions (ideally, I would like only the last 5 years):
customerID | customerSeg | timestamp | service | phonecall |
---|---|---|---|---|
1 | 1 | 1999-01-01 | 0 | 0 |
1 | 1 | 2000-01-01 | 1 | 1 |
1 | 1 | 2000-06-01 | 2 | 2 |
2 | 2 | 2001-01-01 | 1 | 1 |
2 | 2 | 2003-01-01 | 1 | 2 |
3 | 3 | 1999-01-01 | 0 | 0 |
4 | 4 | 2005-01-01 | 1 | 1 |
4 | 4 | 2008-01-01 | 1 | 1 |
I've tried almost everything, I could come up with. So, I would really appreciate some inputs. I'm pretty much confined to using Pandas and Python, since it's the language, I'm most familiar with, but also because I need to read a csv file of the customer segmentation.