0

Replace/add values to column in dataframe x looking at values in data frame y in R

temp file or X is a very big data frame

     1     idname    3    unit
      aa    jhn      cc   NA
      dd    m234     ff   NA
      gg    cind     ii   NA
      nn    ....
      pp.....

map file or Y is a small data frame

name    id            contact     address
john    jhn           J123        J
Mary    Mry           M234        M

My condition is

for(i in 1:length(x$1)) {
  if (X$2==Y$alt_name1 || X$2==Y$alt_name2 || X$2==Y$alt_name3)
  X$name[i] = Y$name[i]
}

That is, If the values in any of the columsn in Y except Y$name matches with value in X$2 corresponding Y$name should be added in exact row of X$name

Is there any efficient way to carry out this operation ? x had some millions of rows and y has say 4 rows.

Any help is very much appreciated.

What i have now is

for (i in 1: length(tempFile$unit)) {
    for (j in 1: length(mapFile$Name)) {
        if (tempFile$idname[i]==mapFile$id[j])
        elseif (tempFile$idname[i]==mapFile$contact[j]) 
        elseif (tempFile$idname[i]==mapFile$address[j])             
        tempFile$unit[i] <- mapFile$Name[j]
        }
    }
carl whyte
  • 103
  • 1
  • 6
  • 3
    Can you edit in a [reproducible example](http://stackoverflow.com/a/5963610/1188479) including the actual dataset structure (or a decimated version) using something like `dput`? That will make it much easier to answer. Offhand I think you're looking for an answer which includes `merge`, but I can't provide a solid answer for your problem without a reproducible example. – Adam Hyland Jun 21 '13 at 18:44
  • Thanks Adam I was trying to figure out how to edit and make it in proper format – carl whyte Jun 21 '13 at 18:48
  • lets say x had some millions of rows and y has say 4 to 5 rows – carl whyte Jun 21 '13 at 18:49
  • 1
    In that case I would just pop in a data frame with 10 rows for one and 3-4 for another. Once we get the structure down we can just scale it up to see if the problem needs a specific solution for scale. But you want to provide some code which anyone can copy/paste into R and start working on to help you. – Adam Hyland Jun 21 '13 at 19:04
  • Hi adam, would that be enough, thanks – carl whyte Jun 21 '13 at 19:17
  • I actually just used `read.table(text = "stuff")` to get your table in. What you'll want to do in the future is something like `dput(big.df)` and paste that output into the window if it's reasonable. – Adam Hyland Jun 21 '13 at 19:23

1 Answers1

0
big.df <- read.table(text = "1     2     3    name
aa    jhn   cc   NA
dd    m234  ff   NA
gg    cind  ii   NA",
                     header = TRUE, check.names=FALSE, as.is = TRUE)

small.df <- read.table(text = "name    alt_name1     alt_id   alt_name3
john    jhn           J123        J
Mary    Mry           M234        M", 
                       header = TRUE, check.names=FALSE, as.is = TRUE)


alt.names <- big.df[, 1:3]

alt.key <- small.df[, 2]

ifelse(alt.names[, 1] %in% alt.key |
       alt.names[, 2] %in% alt.key |
       alt.names[, 3] %in% alt.key, alt.key, NA)

something like this should work. Obviously you'll want to DRY it out a bit, but ifelse is vectorized and you can simply pass the result into the name column on big.df. You can also do it without ifelse using just match or %in% (which is just match as a binary operator) and it will be much must faster than a loop.

Adam Hyland
  • 878
  • 1
  • 9
  • 21