1

I'm trying to use apply (and avoid a for loop) to run the pair of operations below on a dataframe using values from two different strings. The strings and the dataframe I'm working with look like this:

x <- c("A", "B", "C")
y <- c(1, 3, 4)

df:
var1  var2
ddAd  NA
dBdd  NA
ddCd  NA

I'm trying to run the following two lines of code for each pair of values in the two strings using apply.

z <- grep(x, df$var1, value = FALSE)
df[z, 3] <- y

The end result I'm going for is this:

var1  var2
ddAd  1
dBdd  3
ddCd  4

My attempts to use apply so far seem to work fine with the first line of code, but I run into trouble with the second line. I think I need to run an apply command within an apply command in this situation, but I haven't been able to get that to work. Can anyone show me how to use "apply" in this situation? Thanks!

  • Apply doesn't necessarily means performance gain. See [this](http://stackoverflow.com/questions/2275896/is-rs-apply-family-more-than-syntactic-sugar) and [this](http://stackoverflow.com/questions/7638095/r-confusion-with-apply-vs-for-loop). It's the vectorization really speed things up, and you need to avoid some common mistakes could happen in for loop. – dracodoc Sep 21 '16 at 03:01

1 Answers1

1

We can paste the 'x' vector into a single string and use as pattern in the grep

z <- grep(paste(x, collapse="|"), df$var1, value = FALSE)

and use the index to reorder the 'y' and assign it to 'var2'

df$var2 <- y[z]
df
#  var1 var2
#1 ddAd    1
#2 dBdd    3
#3 ddCd    4
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you very much for your reply @akrun ! It's much more elegant than the approach I was trying. I have just one more question about your approach: is there a way to make this work if vector 'x' doesn't have an exhaustive list of the names in the df's var1 column? – R. Buchanan Sep 22 '16 at 01:59
  • Here's an example of what I mean: The 'x' and 'y' vectors stay the same, but the df has one more row: "ddDd NA". Is there a way to run your suggested operation so that the var2 values that are returned are "1, 3, 4, NA"? – R. Buchanan Sep 22 '16 at 02:15
  • @R.Buchanan Perhaps `df$var2[z] <- y[z]` – akrun Sep 22 '16 at 03:08
  • As @akrun points out in a separate thread, `df$var2[z] <- y` is the correct operation. `df$var2[z] <- y[z]` doesn't work as intended. – R. Buchanan Sep 22 '16 at 03:43
  • @R.Buchanan Thanks for the clarification. – akrun Sep 22 '16 at 03:43