1

I have 2 data frames,

R1 <- c("15515","5156","65656","1566", "2857")
RC <- c("A","X","B","Z","A")
R2 <- c("515","5156.11-","415-","1455-","886")
df <- data.frame(R1,RC,R2)

R1 <- c("15515","8888","65656","1566")
R2 <- c("111","222","333","444")
R3 <- c("999","888","777","666")
df2 <- data.frame(R1,R2,R3)

I need to first left join df on R1,

Conditions to be achieved,

1. if df$R1 != df2$R1
  df$R2 <- df$R2  (As it is in df)
  
2. if df$R1 == df2$R1 & (df$RC == A | df$RC == B) then,
  df$R2 <- df2$R2
  
3. if df$R1 == df2$R1 & (df$RC != A | df$RC != B) then,
  df$R2 <- df2$R3

The output may look like

 R1        RC        R2
15515      A        111
5156       X        5156.11-
65656      B        333
1566       Z        666
2857       A        886

I tried executing below code,

df$R1[which(df$R1 == df2$R1) & which((df$RC == 'A' | df$RC == 'B'))] <- df2$R2[which(df$R1 == df2$R1) & which((df$RC == 'A' | df$RC == 'B'))]
df$R1[which(df$R1 == df2$R1) & which((df$RC != 'A' | df$RC != 'B'))] <- df2$R3[which(df$R1 == df2$R1) & which((df$RC != 'A' | df$RC != 'B'))]

But I lost condition 1 from this.

Your kind help is very much appreciated!

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Vin
  • 89
  • 5
  • If you are using a version <4.0.0 you should use the data.frame call with StringsAsFactors=FALSE. The RC column is otherwise handled as a factor and that might cause unwanted side effects later on. – SebSta Jun 29 '20 at 09:03
  • I use Version 1.1.383 – © 2009-2017 RStudio, Inc. I believe the data is in Factors only since beginning. Ain't it will be fine? – Vin Jun 29 '20 at 09:13
  • `df` and `df2` do not have same number of rows. How do you compare them `df$R1 == df2$R1` ? – Ronak Shah Jun 29 '20 at 09:46
  • Hi @RonakShah I am not fully sure but, we can first left join (df) and then go for solving conditions one by one. will n't it help? – Vin Jun 29 '20 at 10:11

2 Answers2

1

You can join df with df2 on R1 and apply conditions with case_when :

library(dplyr)

df %>%
  left_join(df2, by = 'R1') %>%
  mutate(R2 = case_when(is.na(R2.y) ~ R2.x, 
                        RC %in% c('A', 'B') ~ R2.y, 
                        TRUE ~ R3)) %>%
  select(names(df))

#     R1 RC       R2
#1 15515  A      111
#2  5156  X 5156.11-
#3 65656  B      333
#4  1566  Z      666
#5  2857  A      886
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks for this but I am getting NA in some R2 column, R1 RC R2 1 15515 A 2 5156 X 5156.11- 3 65656 B 4 1566 Z 5 2857 A 886 – Vin Jun 29 '20 at 10:37
  • 1
    @Vin Please convert factor columns to characters first. The simple way in your example is `df <- data.frame(R1,RC,R2, stringsAsFactors = FALSE)` do the same for `df2`. Or if the dataframe is already created you can also use `df[] <- lapply(df, as.character)` to convert all columns to characters. – Ronak Shah Jun 29 '20 at 10:39
  • Unfortunately it is not working on actual data. I am not sure why. At the end I am getting same values in R2 as it was in df (actual data is different), any other reason? – Vin Jun 29 '20 at 13:18
  • 1
    @Vin Did you assign the result back to new object? `df1 <- df %>% left_join(df2, by = 'R1')....` ? – Ronak Shah Jun 29 '20 at 13:38
  • Oops, Now it worked... Thanks a lot! but I had 485504 before left join, now I have 523692 records, though I don't have any NA/NULL/duplicates in join's linked column. – Vin Jun 29 '20 at 13:54
  • 1
    I think some `R1` values might be duplicates in either `df1` or `df2` which increases number of rows. – Ronak Shah Jun 29 '20 at 14:14
  • Thanks a lot @RonakShah, you are a true gem! – Vin Jun 29 '20 at 14:18
0

Its maybe easier to do this with ifelse. Note that df and df2 have different nrows so there will be a warning:

df$R2 <- ifelse(df$R1 != df2$R1, df$R2,
  ifelse(df$R1 == df2$R1 & (df$RC == "A" | df$RC == "B"), df2$R2,
    ifelse(df$R1 == df2$R1 & (df$RC != "A" | df$RC != "B"), df2$R3, "X")
  )
)

Maybe you can leave out the last ifelse, which prints and "X" if none of your criteria are fullfilled.

Also im running R Version 3.6.1 and i get an error message if i dont use stringsAsFactors=FALSE.

SebSta
  • 476
  • 2
  • 12
  • I am getting error on this code, Error in Ops.factor(df$R1, df2$R1) : level sets of factors are different In addition: Warning message: In is.na(e1) | is.na(e2) : longer object length is not a multiple of shorter object length – Vin Jun 29 '20 at 09:33