-3

There are two files and the data looks like this:
file1

Col1   Col2  
**A**      10  
**B**      20   
**C**      30   
**D**      40  

file2:

Col1   col2   
**A**   XYZ  
**B**   PQR  
**B**   QES  
**C**   UIY  
**D**   UUI  
**D**   HUI  
**E**   BIG   

I want to join these two files on col1 1 ,but I want only those entries where we have one to one mapping only. The output shoul look like this:
file A (where we have match)
A 10 XYZ
C 30 UIY

file B (Where we don't found a one to one map)

col1 col2
B 20
D 40

AS B and D have multiple entries in file 2, so I don't want these records in my join result.

  • Related merge post: [How to join (merge) data frames (inner, outer, left, right)?](http://stackoverflow.com/questions/1299871) – zx8754 Jul 25 '16 at 08:15

1 Answers1

2

We can use a inner_join after subsetting the second dataset

library(dplyr)
df2 %>% 
   group_by(Col1) %>% 
   filter(n()==1) %>% 
   inner_join(., df1, by = "Col1")
#     Col1  col2  Col2
#   <chr> <chr> <int>
#1     A   XYZ    10
#2     C   UIY    30

Or to avoid duplicate elements, as @zx8754 suggested

res <- df2 %>%
        inner_join(., df1, by = "Col1") %>%
        group_by(Col1) %>%
        filter(n()==1)

To get the rows that are not in the 'res', do a anti_join

res %>%
    select(Col1, Col2) %>% 
    anti_join(df1, .)
#  Col1 Col2
#1    B   20
#2    D   40

Or using data.table

library(data.table)
na.omit(setDT(df2)[df1,  Col2 := Col2[.N==1],on = "Col1", by = .EACHI])
#   Col1 col2 Col2
#1:    A  XYZ   10
#2:    C  UIY   30
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    I would add filter after join, in case df1 has duplicated values. – zx8754 Jul 25 '16 at 08:14
  • I get the one which i have found a match now, I want to segregate those which don't have a match in df2 so there will be two file one: where there is a match and second: where we don't have a match. I will test your approach on my test dataset Thanks a lot!!! – Himanshu Chawla Jul 25 '16 at 09:38
  • @HimanshuChawla this is based on your example and it is working for that. – akrun Jul 25 '16 at 09:41
  • @HimanshuChawla If it is a new problem, please consider to post as a new question. – akrun Jul 25 '16 at 09:51
  • Its not a new question its just an extension to the previous one – Himanshu Chawla Jul 25 '16 at 09:53
  • Just a quick one I have 198 columns I cant right each name in select, I tried select(df1,1:198) but its not working.... – Himanshu Chawla Jul 25 '16 at 10:17