The details of t_id
and date
are not described in the question so in (1) we have assumed that there are up to 3 t_id
values per email
and that they appear in the order corresponding to date1
, date2
and date3
respectively with all other date
values being NA. For example, if there are 2 t_id
values for a particular email then the first one will have date1
as a date and date2
and date3
be NA. The second will have date2
as a date and date1
and date3
will be NA. In (2) we assume the same except we generalize to k
instead of 3.
No packages are used.
1) Use by
to split on email
and then manually construct the row for each. Finally rbind
the rows together.
do.call("rbind",
by(DF, DF$email, function(x) {
t_id <- c(x$t_id, NA, NA, NA)[1:3]
date <- c(na.omit(c(x$date1, x$date2, x$date3)), NA, NA, NA)[1:3]
data.frame(email = x$email[1],
t_id1 = t_id[1], date1 = date[1],
t_id2 = t_id[2], date2 = date[2],
t_id3 = t_id[3], date3 = date[3]
)
}
))
giving:
email t_id1 date1 t_id2 date2
10@line.nl 10@line.nl 100689822 2016-09-18 NA <NA>
1000988@boerman.nl 1000988@boerman.nl 100732587 2016-11-01 NA <NA>
1111tk68@net.nl 1111tk68@net.nl 100640340 2016-08-01 100641415 2016-08-02
natas@gmail.com natas@gmail.com 100678318 2016-09-05 100678319 2016-10-05
t_id3 date3
10@line.nl NA <NA>
1000988@boerman.nl NA <NA>
1111tk68@net.nl NA <NA>
natas@gmail.com NA <NA>
2) If desired we could generalize this to up to k
dates and t_id
values. In this case rbind
/by
produces a new data frame long
which has k
rows for each email
. The first row for each email
in long
corresponds to the first tid
and date
and so on up to the kth. long
is subsequently reshaped to wide.
is.date <- grepl("date", names(DF))
k <- sum(is.date)
long <- do.call("rbind",
by(DF, DF$email, function(x)
data.frame(email = x$email[1],
time = 1:k,
t_id = c(x$t_id, rep(NA, k))[1:k],
date = c(na.omit(do.call("c", x[is.date])), rep(NA, k))[1:k]
)
)
)
reshape(long, dir = "wide", idvar = "email")
giving:
email t_id.1 date.1 t_id.2 date.2 t_id.3 date.3
10@line.nl.1 10@line.nl 100689822 2016-09-18 NA <NA> NA <NA>
1000988@boerman.nl.1 1000988@boerman.nl 100732587 2016-11-01 NA <NA> NA <NA>
1111tk68@net.nl.date11 1111tk68@net.nl 100640340 2016-08-01 100641415 2016-08-02 NA <NA>
natas@gmail.com.date11 natas@gmail.com 100678318 2016-09-05 100678319 2016-10-05 NA <NA>
Note: The input DF
in reproducible form was assumed to be:
Lines <- "t_id date1 date2 date3 email
100678318 2016-09-05 <NA> <NA> natas@gmail.com
100678319 <NA> 2016-10-05 <NA> natas@gmail.com
100732587 2016-11-01 <NA> <NA> 1000988@boerman.nl
100689822 2016-09-18 <NA> <NA> 10@line.nl
100640340 2016-08-01 <NA> <NA> 1111tk68@net.nl
100641415 <NA> 2016-08-02 <NA> 1111tk68@net.nl"
DF <- transform(read.table(text = Lines, header = TRUE, na.strings = "<NA>"),
date1 = as.Date(date1),
date2 = as.Date(date2),
date3 = as.Date(date3))