How to generate cumulative distinct count with data condition in R
I have a problem in R. The dataset contains 9 categorical variables so I have to manipulate numeric variables to convert the categorical variable to numbers. The new variables I want to build look like this, for instance, the distinct number homephone of the same SSN in that record in last 7 days. Parts of the code I have written is as follows. It took me 2 hours to do the calculation for only 30% records in the 100k dataset. How could I optimize this code to make it more efficient? Thank you.
Original variables: record date ssn firstname lastname address zip dob homephone
Data sample: Here is a link to an image of my one of my data samples
### Data sample[enter image description here][1]
|record | date | ssn |homephone ||date1 |dscnt|
|52 | 20150101 |737610282 |9105580920 |20150101 | 1 |
|91 | 20150101 |737610282 |3301371772 |20150101 | 2 |
|144 | 20150101 |737610282 |4139873682 |20150101 | 3 |
|236 | 20150101 |737610282 |4345558469 |20150101 | 4 |
|272 | 20150101 |737610282 |3979419673 |20150101 | 5 |
|282 | 20150102 |737610282 |8261221737 |20150102 | 6 |
|330 | 20150102 |737610282 |9105580920 |20150102 | 6 |
|412 | 20150102 |737610282 |7115050432 |20150102 | 7 |
|493 | 20150102 |737610282 |6872924382 |20150102 | 8 |
|571 | 20150103 |737610282 |2528003837 |20150103 | 9 |
|621 | 20150103 |737610282 |9105580920 |20150103 | 9 |
|763 | 20150103 |737610282 |581105006 |20150103 | 10 |
|854 | 20150104 |737610282 |9105580920 |20150104 | 10 |
|883 | 20150104 |737610282 |979778244 |20150104 | 11 |
|966 | 20150104 |737610282 |9952278232 |20150104 | 12 |
|974 | 20150104 |737610282 |3664924933 |20150104 | 13 |
|1057 | 20150104 |737610282 |9105580920 |20150104 | 13 |
|1091 | 20150104 |737610282 |32384275 |20150104 | 14 |
|1166 | 20150105 |737610282 |942052630 |20150105 | 15 |
|1304 | 20150105 |737610282 |9105580920 |20150105 | 15 |
|1416 | 20150105 |737610282 |1384461737 |20150105 | 16 |
These are the codes:
Name of the same SSN, Address, Phone for the last 7 days i= 1
e1 = NULL
for (i in 1:365)
{
i = 360
a1 <- df0 %>% filter(date <= DT[365-i,])
b1 <-
a1 %>%
select(record,date,ssn,homephone) %>%
mutate(date1 = date + days(as.integer(today() -max(a$date)) )) %>%
filter(date1 >= today() - days(7)) %>%
group_by(ssn) %>%
mutate(dscnt = n_distinct(homephone))
f1 = b1 %>% filter(date == DT[365-i,])
e1 = rbind(e1,f1)
}