1

I have the following data.frame “A” in R:

ID Interaction.Type Target.ID Student Created.At
1 email_sent 368932 Isabel Gauss 10/20/2021 0:02
1 email_sent 370153 Isabel Gauss 11/2/2021 0:04
1 Open_sent 375012 Isabel Gauss 12/6/2021 22:15
1 email_sent 382353 Isabel Gauss 2/3/2022 0:06
50 email_sent 368932 Jen Gonzalez 10/20/2021 0:02
50 email_sent 370153 Jen Gonzalez 11/2/2021 0:04
50 email_sent 375012 Jen Gonzalez 12/6/2021 22:15
10 email_sent 354609 Isabel Goodwin 9/23/2021 21:09
10 email_sent 368089 Isabel Goodwin 10/20/2021 0:02
10 email_sent 375017 Isabel Goodwin 12/5/2021 21:05
10 Open_sent 383095 Isabel Goodwin 2/8/2022 1:05
200 email_sent 343546 Jason Bin 10/9/2022 21:05
200 email_sent 343543 Jason Bin 2/8/2021 1:05
550 email_sent 546354 Brad pit 05/29/2023 0:02
550 email_sent 344546 Brad pit 10/20/2021 0:02
550 email_sent 343434 Brad pit 08/15/2022 0:02

And the following integer table “B” in R

ID
1
125
200
550
870

I need to group data.frame "A" first to come up with the below:

ID Interaction.Type Student Min_Created.At
1 email_sent Isabel Gauss 10/20/2021 0:02
50 email_sent Jen Gonzalez 10/20/2021 0:02
11 email_sent Isabel Goodwin 9/23/2021 21:09
200 email_sent Jason Bin 2/8/2021 1:05
550 email_sent Brad Pit 10/20/2021 0:02

And then filter the above output using the above ID table "B" to come up with the below:

ID Interaction.Type Student Min_Created.At
1 email_sent Isabel Gauss 10/20/2021 0:02
125 Null Null Null
200 email_sent Jason Bin 2/8/2021 1:05
550 email_sent Brad Pit 10/20/2021 0:02
870 Null Null Null

The following is not working:

Output_1 <- data.frame(A$ID, A$Interaction.Type == "email_sent", A$Student, Min(A$Created.At), group_by(ID))
r2evans
  • 141,215
  • 6
  • 77
  • 149
jsaab
  • 15
  • 5
  • 1
    Welcome to SO, jsaab! Please share your data in a format that is much easier for us to consume (the spaces in your timestamp columns make it a little more challenging. The best is `dput(x)` where `x` is your frame or a subset of the rows and columns (your frame size here is fine, larger data may benefit from subsetting). Inf your desired output, you have the string literal `"Null"`, do you intend for a real-looking string, or would it be better to use R's `NA` to denote no-data-available (among other things)? – r2evans May 23 '23 at 13:03

1 Answers1

0

First, we need the Created.At column to be a "real" timestamp; in R, this means a POSIXt object. While it's not always true that we cannot work with strings, in this case since "month" is first, sorting and grabbing the minimum of the dates would take more work trying to deal with parsing the string, it's much easier to convert to a timestamp.

dat$Created.At <- as.POSIXct(dat$Created.At, format = "%m/%d/%Y %H:%M")
head(dat)
#   ID Interaction.Type Target.ID      Student          Created.At
# 1  1       email_sent    368932 Isabel Gauss 2021-10-20 00:02:00
# 2  1       email_sent    370153 Isabel Gauss 2021-11-02 00:04:00
# 3  1        Open_sent    375012 Isabel Gauss 2021-12-06 22:15:00
# 4  1       email_sent    382353 Isabel Gauss 2022-02-03 00:06:00
# 5 50       email_sent    368932 Jen Gonzalez 2021-10-20 00:02:00
# 6 50       email_sent    370153 Jen Gonzalez 2021-11-02 00:04:00

From here, the operations are broken down into two things:

dplyr

library(dplyr)
dat %>%
  slice_min(Created.At, by = Student) %>%
  left_join(B, ., by = "ID")
#    ID Interaction.Type Target.ID      Student          Created.At
# 1   1       email_sent    368932 Isabel Gauss 2021-10-20 00:02:00
# 2 125             <NA>        NA         <NA>                <NA>
# 3 200       email_sent    343543    Jason Bin 2021-02-08 01:05:00
# 4 550       email_sent    344546     Brad pit 2021-10-20 00:02:00
# 5 870             <NA>        NA         <NA>                <NA>

base R

tmp <- dat[with(dat, ave(as.numeric(Created.At), Student, FUN = function(z) z == min(z))) > 0,]
tmp
#     ID Interaction.Type Target.ID        Student          Created.At
# 1    1       email_sent    368932   Isabel Gauss 2021-10-20 00:02:00
# 5   50       email_sent    368932   Jen Gonzalez 2021-10-20 00:02:00
# 8   10       email_sent    354609 Isabel Goodwin 2021-09-23 21:09:00
# 13 200       email_sent    343543      Jason Bin 2021-02-08 01:05:00
# 15 550       email_sent    344546       Brad pit 2021-10-20 00:02:00
merge(B, tmp, by = "ID", all.x = TRUE)
#    ID Interaction.Type Target.ID      Student          Created.At
# 1   1       email_sent    368932 Isabel Gauss 2021-10-20 00:02:00
# 2 125             <NA>        NA         <NA>                <NA>
# 3 200       email_sent    343543    Jason Bin 2021-02-08 01:05:00
# 4 550       email_sent    344546     Brad pit 2021-10-20 00:02:00
# 5 870             <NA>        NA         <NA>                <NA>

Data

dat <- structure(list(ID = c(1L, 1L, 1L, 1L, 50L, 50L, 50L, 10L, 10L, 10L, 10L, 200L, 200L, 550L, 550L, 550L), Interaction.Type = c("email_sent", "email_sent", "Open_sent", "email_sent", "email_sent", "email_sent", "email_sent", "email_sent", "email_sent", "email_sent", "Open_sent", "email_sent", "email_sent", "email_sent", "email_sent", "email_sent"), Target.ID = c(368932L, 370153L, 375012L, 382353L, 368932L, 370153L, 375012L, 354609L, 368089L, 375017L, 383095L, 343546L, 343543L, 546354L, 344546L, 343434L ), Student = c("Isabel Gauss", "Isabel Gauss", "Isabel Gauss", "Isabel Gauss", "Jen Gonzalez", "Jen Gonzalez", "Jen Gonzalez", "Isabel Goodwin", "Isabel Goodwin", "Isabel Goodwin", "Isabel Goodwin", "Jason Bin", "Jason Bin", "Brad pit", "Brad pit", "Brad pit"), Created.At = c("10/20/2021 0:02", "11/2/2021 0:04", "12/6/2021 22:15", "2/3/2022 0:06", "10/20/2021 0:02", "11/2/2021 0:04", "12/6/2021 22:15", "9/23/2021 21:09", "10/20/2021 0:02", "12/5/2021 21:05", "2/8/2022 1:05", "10/9/2022 21:05", "2/8/2021 1:05",  "05/29/2023 0:02", "10/20/2021 0:02", "08/15/2022 0:02")), class = "data.frame", row.names = c(NA, -16L))
B <- structure(list(ID = c(1L, 125L, 200L, 550L, 870L)), class = "data.frame", row.names = c(NA, -5L))
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • thanks a lot , in fact my date in R is already formated as below "2021-10-20 00:02:08" so I don' tneed I suppose the "POSIXct" function; only when I copy/pasted in word the created.At date I got the following format 10/20/2021 0:02 – jsaab May 23 '23 at 13:53
  • can you please elaborate more the base R code since I got the follwoing error "Warning message: In FUN(x) : NAs introduced by coercion" ; maybe because I don't have the exact fileds or number of fields since I have first "Interaction.ID", then "Interaction.Type" then "Target.ID", ID, Student, description and then "Created at" – jsaab May 23 '23 at 13:56
  • I don't know, it works with the data you provided. That kind of problem is common when the "class" of a column is unclear. The best way to get past that ambiguity (when it's a concern) is to use `dput`, as it gives us as close to a binary-duplicate of your real data as possible. – r2evans May 23 '23 at 15:10