0

I have two dataframes:

data1:

ID               DateTimeUTC
 A               12/4/2019 11:30:30 PM
 A               12/4/2019 11:30:31 PM
 B               12/5/2019 11:31:00 PM
 B               12/5/2019 11:31:01 PM
 C               12/5/2019 11:31:02 PM

and data2:

 Message         DateTimeUTC
 A               12/4/2019 11:30:30 PM
 A               12/4/2019 11:30:31 PM
 B               12/5/2019 11:31:00 PM
 B               12/5/2019 11:31:01 PM

I would like to have

ID              DateTimeUTC               Message              DateTimeUTC
A               12/4/2019 11:30:30 PM      A           12/4/2019 11:30:30 PM
A               12/4/2019 11:30:31 PM      A           12/4/2019 11:30:31 PM
B               12/5/2019 11:31:00 PM      B           12/4/2019 11:31:00 PM
B               12/5/2019 11:31:01 PM      B           12/4/2019 11:31:01 PM

I wish to only show matching IDs and Messages. I have performed an inner join, but it is giving me duplicates, and it erases one of my column names.

 library('dplyr')
 inner_join(data1,  data2, by = c("ID" = "Message"))  

Goal: Can someone show me how to do an rbind to get the above outcome?

##pseudo_code:
 rbind(data1,data2, order_by ID & Message)
Lynn
  • 4,292
  • 5
  • 21
  • 44
  • 4
    You should provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – M-- Jan 07 '20 at 22:55
  • 1
    Moreover, Please consider formatting your question nicely. It just took me 5 minutes now to edit your question (I could put this time into helping you with this problem instead) – M-- Jan 07 '20 at 23:00
  • 1
    for each data, for example data1, how eill you distinguish between the first row and the second row? eg. in the result, which row were combined? – Onyambu Jan 07 '20 at 23:01
  • 1
    @Onyambu I have edited this to show distinction. I will work on showing the reproducible result as well. – Lynn Jan 07 '20 at 23:08
  • You get duplicates because every ID A gets matched with every Message A. There's no logic here to say that the first ID A should be matched with only the first Message A, and the second, and so on. Do you want to match by timestamps as well? Or what's the pattern? – camille Jan 07 '20 at 23:16
  • By reproducible example, @M-- meant something which we can copy/paste into our R session. Something with `dput` or `data1 <- data.frame(.....` – Ronak Shah Jan 08 '20 at 00:18
  • Ok let me do this. I am away from my pc, I will do this soon – Lynn Jan 08 '20 at 00:47

2 Answers2

1

Actually the idea of inner_join is correct, the problem is that instead of joining only on "ID" = "Message" you should also consider DateTimeUTC. So it is joining on 2 conditions as follows;

library(dplyr)

df1 <-
  data.frame(
    ID = c("A", "A", "B", "B", "C"),
    DateTimeUTC = c("12/4/2019 11:30:30 PM", "12/4/2019 11:30:31 PM", "12/5/2019 11:31:00 PM", 
                    "12/5/2019 11:31:01 PM", "12/5/2019 11:31:02 PM"),
    stringsAsFactors = FALSE
  )

df2 <-
  data.frame(
    Message = c("A", "A", "B", "B"),
    DateTimeUTC = c("12/4/2019 11:30:30 PM", "12/4/2019 11:30:31 PM", 
                    "12/5/2019 11:31:00 PM", "12/5/2019 11:31:01 PM"),
    stringsAsFactors = FALSE
  )

df1 %>%
  inner_join(df2, by = c("ID" = "Message", "DateTimeUTC" = "DateTimeUTC"))

# ID           DateTimeUTC
# A 12/4/2019 11:30:30 PM
# A 12/4/2019 11:30:31 PM
# B 12/5/2019 11:31:00 PM
# B 12/5/2019 11:31:01 PM
Nareman Darwish
  • 1,251
  • 7
  • 14
0

If you want to bind these two data sets together I would use cbind.

A caution though is is that this assumes that the data in each dataset is ordered correctly (i.e. the first 'A' in data1 actually goes with the first 'A' in data2).

library(tidyverse)

data1 <- tibble(ID = c("A", "A", "B", "B", "C"),
                    DateTimeUTC = c("12/4/2019 11:30:30 PM",
                                    "12/4/2019 11:30:31 PM",
                                    "12/5/2019 11:31:00 PM",
                                    "12/5/2019 11:31:01 PM",
                                    "12/5/2019 11:31:02 PM"))

data2 <- tibble(Message = c("A", "A", "B", "B"),
                    DateTimeUTC = c("12/4/2019 11:30:30 PM",
                                    "12/4/2019 11:30:31 PM",
                                    "12/5/2019 11:31:00 PM",
                                    "12/5/2019 11:31:01 PM"))
# remove 'C' from the list
data1 <- data1 %>% filter(ID != "C")

# bind datasets togethers as per example above.
data_bind <- cbind(data1, data2)
Colin C
  • 76
  • 8
  • 1
    see @Nareman's answer -- it is a method that can be applied more generally. Only use my answer if it specifically fits your data. – Colin C Jan 07 '20 at 23:59