I have another question in the same project scope pandas dataframe groupby datetime month however I fear the data structure might be to complicated so I am trying an alternative approach. I am hoping this achieves the same result.
I am ideally looking to build a matrix of phone numbers as rows and start and end dates as columns and identify the period in which a telephone call was made.
This will be achieved by transforming a dataset of dates and phone numbers to a complete list of dates, identifying an end day match, and then seeing if the date the telephone call was made falls within that period.
The original data looks like:
Date = as.Date(c("2019-03-01", "2019-03-15","2019-03-29", "2019-04-10","2019-03-05","2019-03-20"))
Phone = c("070000001","070000001","070000001","070000001","070000002","070000002")
df<-data.frame(Date,Phone)
df
## Date Phone
## 1 2019-03-01 070000001
## 2 2019-03-15 070000001
## 3 2019-03-29 070000001
## 4 2019-04-10 070000001
## 5 2019-03-05 070000002
## 6 2019-03-20 070000002
Ideally I would want it to look like this:
## Date Phone INT_1 INT_2 INT_3 INT_4 INT_5
## 1 2019-03-01 070000001 X X X X X
## 2 2019-03-15 070000002 X X X
Where INT
is a series of dates + 30 and X
indicates that the telephone number appeared in that rolling period.
To do this I assume you need two datasets. The one above, of telephone numbers by date called, and a second which is the complete list of days and their = 30 day counter parts.
dates<-as.data.frame(seq(as.Date("2016/7/1"), as.Date("2019/7/1"),"days"),
responseName = c('start'))
dates$end<-dates$start+30
## INT start end
## 1 2016-07-01 2016-07-31
## 2 2016-07-02 2016-08-01
## 3 2016-07-03 2016-08-02
## 4 2016-07-04 2016-08-03
But how do I get the two to evaluate together? I am assuming some kind of merge and expand of the telephone data into the date list then spread the dates by the row index/ INT?