2

How can I address a partial match in a data frame? Lets say this is my df df

   V1  V2  V3 V4
1 ABC 1.2 4.3  A
2 CFS 2.3 1.7  A
3 dgf 1.3 4.4  A

and I want to add a column V5 containing a number 111 only if the value in V1 contains a "f" in the name and a number 222 only if the value in V1 contains a "gf". Will I get problems since several values contain an "f" - or does the order I ender the commands will take care of it?

I tried something like:

df$V5<- ifelse(df$V1 = c("*f","*gf"),c=(111,222) )

but it does not work.

Main problem is how can I tell R to look for "partial match"?

Thanks a million for your help!

Arun
  • 116,683
  • 26
  • 284
  • 387
RNewbi
  • 83
  • 1
  • 6
  • `ifelse` isnt't written with quite that much "insight". The "=" sign in R is for assignment, not for tests, and it doesn't support an "inner" level of branching logic. – IRTFM May 03 '13 at 17:13
  • Just to give you an idea: You can use `ifelse` in this manner: `ifelse(grepl("gf", df$V1), 222, ifelse(grepl("f", df$V1), 111, NA))`. [But I suspect it might be a tad slower](http://stackoverflow.com/questions/16275149/does-ifelse-really-calculate-both-of-its-vectors-every-time-is-it-slow). – Arun May 03 '13 at 17:33
  • That `ifelse` construction would have the advantage that it could be simply assigned without needing to pre-specify the value of V5 to be NA. – IRTFM May 03 '13 at 17:56

2 Answers2

1

Besides the solution setting the values in a sequence for "f", "gf", ... it's worth to have a look at regular expressions capability for zero-width lookahead / lookbehind.

If you want to grep all rows which contain "f" but not "gf" you can

v1 <- c("abc", "f", "gf" )
grep( "(?<![g])f" , v1, perl= TRUE )
[1] 2

and if you want to grep only those which contain "f" but not "fg"

v2 <- c("abc", "f", "fg")
grep( "f(?![g])" , v2, perl= TRUE )
[1] 2

And of course you can mix that:

v3 <- c("abc", "f", "fg", "gf")
grep( "(?<![g])f(?![g])" , v3, perl= TRUE )
[1] 2

So for your case you can do

df[ grep( "(?<![g])f" , df$V1, perl= TRUE ), "V5" ] <- 111
df[ grep( "gf" , df$V1, perl= TRUE ), "V5" ] <- 222
Beasterfield
  • 7,023
  • 2
  • 38
  • 47
0
 df$V5 <- NA
 df$V5[grep("f", df$V1)] <- 111
 df$V5[grep("gf", df$V1)] <- 222  # obviously some of the "f" values could be overwritten.

There is a switch function which I am too dense to understand that always seemed to me like it should be like the Pascal case. I could do it with some weird Boolean to numeric indexing maneuvers but that is not likely to be helpful.

IRTFM
  • 258,963
  • 21
  • 364
  • 487