0

I am new to R & need a bit of guidance here, my problem is like this: I have 2 dfs on both dfs I have performed series of operations and I need to perform this operation in the end

df1 & df2

df1 <- data.frame(name = c("A","B","C","D","E","F","F","G","s","x")) 
#(1)

df1$newname <-  c("A","V","C","D","c","v","x") #(name extracted from other column) (2)

df2 <- data.frame(Id_name = c("A","B","C","s","s", "x","G", "g"))
#(3)

Step1 = I need to match 2 with 3 first and extract common names, let's name it 4

Step2 = find names in 4 that have duplicate value = 1

Step3 = delete those values from 1 and 3

I tried using anti_join and semi_join but I guess that works for numeric values only, Is there any specific library available for this and how to solve this

rajeswa
  • 47
  • 9
  • I have tried using lib(compare) also I can find df$tm to get few names but cant resolve the problem I am facing – rajeswa Dec 28 '19 at 08:54
  • Can you make your example reproducible? Add some mock data and what is the result you're expecting. See [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for more information on how to do just that. – Roman Luštrik Dec 28 '19 at 09:28

1 Answers1

1

The strategy followed below relies on intersect/extraction:

  1. Get the common names with intersect.
  2. Remove the df1$name that can be found in common.
  3. Do as point 2, this time with df2$Id_name.

It is fully vectorized, no need for joins.
Note also argument drop = FALSE. The examples posted in the question have only one column, and with the default drop = TRUE the results would loose the dim attribute, becoming vectors.

common <- intersect(newname, df2$Id_name)
df1 <- df1[!df1$name %in% common, , drop = FALSE]
df2 <- df2[!df2$Id_name %in% common, , drop = FALSE]
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Please check just edited my post, @Rui- Thanks a ton, I guess this logic should work. – rajeswa Dec 28 '19 at 10:50
  • 1
    @rajeshdhingra Done, see how it is now. – Rui Barradas Dec 28 '19 at 11:19
  • yes thanks again , got the result, but just a point to make, is "intersect" here doing exact match(case sensitive) ?, IMO it is doing that, is there any option for partial match or any other option which can ignore lower/upper case (Just asking for gaining insights and for any future use) – rajeswa Dec 28 '19 at 11:57
  • 1
    @rajeshdhingra `intersect(tolower(newname), tolower(df2$Id_name))`. But then you will have to do the same in the next instructions `tolower(df1$name)` and `tolower(df2$Id_name)`, or, simpler, do it all before and work with the data sets in lower case. – Rui Barradas Dec 28 '19 at 12:45
  • Ok sure will do that and check the results, @Rui Barradas Thank You very much. – rajeswa Dec 28 '19 at 13:03