1

Here is df what I want is to cross matching columns

df <- structure(list(id_sender = c(4L, 69L, 217L, 217L, 149L, 71L, 221L, 217L, 258L, 75L), id_receiver = c(75L, 150L, 72L, 127L, 69L, 218L, 127L, 215L, 89L, 4L), gender_sender = c("Female", "Female", "Female", "Female", "Female", "Female", "Female", "Female", "Male", "Male"), gender_receiver = c("Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", "Female", "Female")), .Names = c("id_sender", "id_receiver", "gender_sender", "gender_receiver"), row.names = c(NA, -10L), class = "data.frame")

I tried below solution but is it any other better option to achieve the results

df$sum <- (df$id_sender + df$id_receiver)/(df$id_sender * df$id_receiver)
df <- df[!duplicated(df$sum), ]
r2evans
  • 141,215
  • 6
  • 77
  • 149
Janjua
  • 235
  • 2
  • 13

2 Answers2

1

I am not sure if this one is what you are looking for. Please let me know: edit: thanks to input of r2evans

#Make sure StringsAsFactors = FALSE
df <- structure(list(id_sender = c(4L, 69L, 217L, 217L, 149L, 71L, 221L, 217L, 258L, 75L), 
                     id_receiver = c(75L, 150L, 72L, 127L, 69L, 218L, 127L, 215L, 89L, 4L), 
                     gender_sender = c("Female", "Female", "Female", "Female", "Female", "Female", 
                                       "Female", "Female", "Male", "Male"), 
                     gender_receiver = c("Male", "Male", "Male", "Male", "Male", "Male", "Male", 
                                         "Male", "Female", "Female")), 
                .Names = c("id_sender", "id_receiver", "gender_sender", "gender_receiver"), 
                row.names = c(NA, -10L), class = "data.frame")

# get logical row by row match
df$id_sender == df$id_receiver

# get logical row by row match
df$gender_sender == df$gender_receiver


# row by row match id
df$match_id <- df$id_sender==df$id_receiver
any(df$match_id)

# match overall id
intersect(df$id_sender, df$id_receiver)

# row by row match gender
df$match_gender <- df$gender_sender==df$gender_receiver
any(df$match_gender)

# match overall gender
intersect(df$gender_sender, df$gender_receiver)

TarJae
  • 72,363
  • 6
  • 19
  • 66
  • 1
    Why `ifelse(.., T, F)`? `ifelse` as a function [has baggage](https://stackoverflow.com/q/6668963/3358272), and that expression can be reduced perfectly to `(...)` without the risks. – r2evans Feb 28 '21 at 16:19
  • `structure(..., stringsAsFactors=)` is not a thing. Try `str(structure(list(Species = structure(1L, .Label = c("setosa", "versicolor", "virginica"), class = "factor")), row.names = 1L, class = "data.frame", stringsAsFactors=FALSE))`. – r2evans Feb 28 '21 at 16:21
  • @ r2evans. Thank you very much. In my case when I use `df$match_id <- ifelse(df$id_sender==df$id_receiver)` I get the error: `Fehler in ifelse(df$id_sender == df$id_receiver) : Argument "no" fehlt (ohne Standardwert)` – TarJae Feb 28 '21 at 16:23
  • 1
    I think you missed the point. Change `ifelse(..., T, F)` to `(...)` (no ifelse). There's no need to call the function, the boolean nature is built-in to the comparison. – r2evans Feb 28 '21 at 16:27
  • @r2evans. Thank you very much again for your input. I was not even thinking about this issue. You broadened my horizon! – TarJae Feb 28 '21 at 16:37
  • Now I'm curious. Would you kindly explain structure(..., stringAsFactors=FALSE) is not a thing. Do you mean it is not necessary in this case or is it generally used false at this position? Thank you in advance. – TarJae Feb 28 '21 at 16:44
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/229326/discussion-between-tarjae-and-r2evans). – TarJae Feb 28 '21 at 16:51
  • 1
    I can't chat now, but if you run my suggested `str(structure(...,stringsAsFactors=FALSE))`, you'll see that ... `Species` is still a factor (at least it is in R-4.0.3). `structure(.)` doesn't do anything with that option, it is added as an attribute that is not referenced anywhere. If you want to ensure there are no factors in the `structure(.)` object, it needs to be addressed before `dput` or afterward with `as.character` on the particular component(s). – r2evans Feb 28 '21 at 17:21
  • Your answer is very helpful as well but it was not something i needed @TarJae and yes for sure I have not dput data properly but it was best of my knowledge and skills @ r2evans really appreciate your feedback thanks to all the devoted members and stackoverflow as it always help in learning something new. – Janjua Feb 28 '21 at 20:22
1

Here is a {dplyr} solution:

library(dplyr)


df <- structure(list(id_sender = c(4L, 69L, 217L, 217L, 149L, 71L, 221L, 217L, 258L, 75L), id_receiver = c(75L, 150L, 72L, 127L, 69L, 218L, 127L, 215L, 89L, 4L), gender_sender = c("Female", "Female", "Female", "Female", "Female", "Female", "Female", "Female", "Male", "Male"), gender_receiver = c("Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", "Female", "Female")), .Names = c("id_sender", "id_receiver", "gender_sender", "gender_receiver"), row.names = c(NA, -10L), class = "data.frame")

df %>%
  rowwise %>%
  mutate(key = paste(sort(c(id_sender, id_receiver)), collapse = "_")) %>% 
  distinct(key, .keep_all = TRUE) 

#> # A tibble: 9 x 5
#> # Rowwise: 
#>   id_sender id_receiver gender_sender gender_receiver key    
#>       <int>       <int> <chr>         <chr>           <chr>  
#> 1         4          75 Female        Male            4_75   
#> 2        69         150 Female        Male            69_150 
#> 3       217          72 Female        Male            72_217 
#> 4       217         127 Female        Male            127_217
#> 5       149          69 Female        Male            69_149 
#> 6        71         218 Female        Male            71_218 
#> 7       221         127 Female        Male            127_221
#> 8       217         215 Female        Male            215_217
#> 9       258          89 Male          Female          89_258

Created on 2021-02-28 by the reprex package (v0.3.0)

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39