4

Suppose I have the following data.frames:

library(dplyr)
set.seed(13)
df <- data_frame(A = sample(letters[1:2], 6, rep=TRUE), B = sample(1:3, 6, rep = TRUE))
new_df <- data_frame(A ="a", B = 4)

Suppose I want to update all the rows of df where A == "a" with the value 4 (This is an example, in general df has more than one row). I can do this the following way:

df %>% left_join(new_df %>% rename(b=B)) %>% mutate(B = ifelse(is.na(b), B, b)) 

Which is fine, but this does not look elegant. Is there a better way to do this?

I came across this issue by cleaning up the data. I calculate certain column from another column, which should be unique id, but due to data collection issues it is not. I have another table with the correct ids, and I want to update them. Usually the number of incorrect ids is low compared to number of correct ids, so doing join seems like an overkill.

mpiktas
  • 11,258
  • 7
  • 44
  • 57

2 Answers2

4

Well, if you're looking for elegant (and fast), here's how you can replace those values in-place:

library(data.table)

dt = as.data.table(df) # alternatively call setDT to convert in-place
setkey(dt, A)

dt[new_df, B := i.B]
dt
#   A B
#1: a 4
#2: a 4
#3: a 4
#4: a 4
#5: b 2
#6: b 2

Two notes. You will get warnings, as data.table is very careful about types and the types of your two tables don't match. Second note - the i. ensures that you use the B column of the i-expression, i.e. the first argument of [.data.table, and is used to resolve conflicts such as here.

eddi
  • 49,088
  • 6
  • 104
  • 155
  • Are you referring to factors vs characters with the type conversion? Just in case you do: OP uses `dplyr::data_frame` which never converts strings to factors. – talat Feb 03 '15 at 17:26
  • 1
    @docendodiscimus no, the `B` columns are of different type - one is numeric, the other one is integer and the assignment is from numeric to integer – eddi Feb 03 '15 at 17:26
  • Btw, in the same logic as your downvote I could argue to downvote yours because you obviously didnt't read the question right. They are explicitly asking for a dplyr solution.. but I'm not going to downvote for such reasons – talat Feb 03 '15 at 17:37
  • @docendodiscimus they're not explicitly asking for a `dplyr` solution, they're asking for a better way to do it – eddi Feb 03 '15 at 17:40
  • That's incorrect. Read the question title. (I'll leave this discussion now, doesn't make much sense IMO) – talat Feb 03 '15 at 17:41
  • ok, fair enough, if you find that to be a good reason to downvote - you should – eddi Feb 03 '15 at 17:42
  • Sorry for the confusion caused. Non-dplyr solutions are ok too. I've edited the question for more clarity. – mpiktas Feb 04 '15 at 10:50
1

It doesn't require dplyr but how about:

df$B <- ifelse (df$A=="a",4,df$B)
Sam Firke
  • 21,571
  • 9
  • 87
  • 105
  • I answered the trivial case in the example but not the broader problem. In an instance with a more complex `df` and `new_df`, and using dplyr, your current solution is a good one. If the `left_join` only affects few cases, as you note, you could minimize the size of that join with an initial call to `anti_join`, setting aside cases where IDs do not need to be replaced, then perform the variable assignment only on values in your `new_df` table: `rbind ( anti_join(df,new_df, by="A"), left_join(new_df,df, by = "A")[,1:2] %>% rename(B = B.x) )` – Sam Firke Feb 03 '15 at 18:56
  • Sorry, for not being clear in the question, I need to update several values. Your solution with anti_join looks nice. – mpiktas Feb 04 '15 at 10:47