1

I have an interesting join that I need to accomplish in R. So here are my two tables.

Table Number 1:

Date                           Name
2016-01-02 10:18:00            CARDOSO, RAMON
2016-01-02 15:02:00            HARRISON, KATHYANNE M
2016-01-02 15:02:00            PALEO, SHERI
2016-01-03 02:09:00            PHANOR, RENALDY
2016-01-03 09:42:00            GUAMAN, ANGEL
2016-01-03 18:47:00            AIME, MADELINE
2016-01-03 18:47:00            CADET, GARDY
2016-01-03 19:31:00            REID, ARTHUR D
2016-01-03 22:11:00            HERNANDEZ-JONES, FREDRICK JOSHUA
2016-01-04 12:32:00            AGUERO, RAUL

Table number 2:

Date                     ID                  Name
2016-01-02 10:18:00      16-22-AR            CARDOSO, RAMON
2016-01-02 15:02:00      16-24-AR            HARRISON, KATHYANNE M", " PALEO, SHERI"
2016-01-02 15:02:00      16-25-AR            HARRISON, KATHYANNE M", " PALEO, SHERI"
2016-01-03 02:09:00      16-31-AR            PHANOR, RENALDY
2016-01-03 09:42:00      16-32-AR            GUAMAN, ANGEL
2016-01-03 18:47:00      16-39-AR            AIME, MADELINE", " CADET, GARDY"
2016-01-03 18:47:00      16-40-AR            AIME, MADELINE", " CADET, GARDY"
2016-01-03 19:31:00      16-42-AR            REID, ARTHUR D
2016-01-03 22:11:00      16-44-AR            HERNANDEZ-JONES, FREDRICK JOSHUA
2016-01-04 12:32:00      16-49-AR            AGUERO, RAUL

My goal is to have ID in table 1 as its own column, but in order to do so I would need to do a join on date and somehow do a match on name where we lookup the names from table 1 in table 2.

UPDATE:

The original dataset looks like this

2016-01-02 10:18:00  16-22-AR               CARDOSO, RAMON
2016-01-02 15:02:00  16-24-AR, 16-25-AR     HARRISON, KATHYANNE M", " PALEO, SHERI"
2016-01-03 02:09:00  16-31-AR               PHANOR, RENALDY
2016-01-03 09:42:00  16-32-AR               GUAMAN, ANGEL
2016-01-03 18:47:00  16-39-AR, 16-40-AR     AIME, MADELINE", " CADET, GARDY"
2016-01-03 19:31:00  16-42-AR               REID, ARTHUR D
2016-01-03 22:11:00  16-44-AR               HERNANDEZ-JONES, FREDRICK JOSHUA
2016-01-04 12:32:00  16-49-AR               AGUERO, RAUL

The goal is to have each name on its own row with its respective ID. The IDs follow the same order as the names, first ID goes with the fist name.

Hope this clarification helps.

Jomisilfe
  • 55
  • 1
  • 9

1 Answers1

2

I assume when you have something like

2016-01-02 15:02:00      16-24-AR            HARRISON, KATHYANNE M", " PALEO, SHERI"
2016-01-02 15:02:00      16-25-AR            HARRISON, KATHYANNE M", " PALEO, SHERI"

The first ID corresponds to the first name, and the second ID corresponds to the second name. Then one approach is to create a new column that has the correct name.

d$order <- unlist(sapply(rle(paste0(d$Date, d$Name))$lengths, seq_len))

split_names <- function(name, order = 1) {
  names <- strsplit(name, '\\", \\"')[[1]] # Split
  names <- gsub('^\\s|\\"', "", names) # Clean up leading space and trailing "
  names[order]
}

d$Newname <- mapply(split_names, d$Name, d$order)
d[, c("Date", "ID", "Newname")]
#                   Date       ID                          Newname
# 1  2016-01-02 10:18:00 16-22-AR                   CARDOSO, RAMON
# 2  2016-01-02 15:02:00 16-24-AR            HARRISON, KATHYANNE M
# 3  2016-01-02 15:02:00 16-25-AR                     PALEO, SHERI
# 4  2016-01-03 02:09:00 16-31-AR                  PHANOR, RENALDY
# 5  2016-01-03 09:42:00 16-32-AR                    GUAMAN, ANGEL
# 6  2016-01-03 18:47:00 16-39-AR                   AIME, MADELINE
# 7  2016-01-03 18:47:00 16-40-AR                     CADET, GARDY
# 8  2016-01-03 19:31:00 16-42-AR                   REID, ARTHUR D
# 9  2016-01-03 22:11:00 16-44-AR HERNANDEZ-JONES, FREDRICK JOSHUA
# 10 2016-01-04 12:32:00 16-49-AR                     AGUERO, RAUL

Data:

structure(list(Date = c("2016-01-02 10:18:00", "2016-01-02 15:02:00", 
"2016-01-02 15:02:00", "2016-01-03 02:09:00", "2016-01-03 09:42:00", 
"2016-01-03 18:47:00", "2016-01-03 18:47:00", "2016-01-03 19:31:00", 
"2016-01-03 22:11:00", "2016-01-04 12:32:00"), ID = c("16-22-AR", 
"16-24-AR", "16-25-AR", "16-31-AR", "16-32-AR", "16-39-AR", "16-40-AR", 
"16-42-AR", "16-44-AR", "16-49-AR"), Name = c("CARDOSO, RAMON", 
"HARRISON, KATHYANNE M\", \" PALEO, SHERI\"", "HARRISON, KATHYANNE M\", \" PALEO, SHERI\"", 
"PHANOR, RENALDY", "GUAMAN, ANGEL", "AIME, MADELINE\", \" CADET, GARDY\"", 
"AIME, MADELINE\", \" CADET, GARDY\"", "REID, ARTHUR D", "HERNANDEZ-JONES, FREDRICK JOSHUA", 
"AGUERO, RAUL")), .Names = c("Date", "ID", "Name"), row.names = c(NA, 
-10L), class = "data.frame")
Weihuang Wong
  • 12,868
  • 2
  • 27
  • 48
  • I'm having some difficulty understanding how the ordering works and implement this solution to my code. – Jomisilfe Dec 20 '16 at 05:56