0

I have a question how to write a loop in r which goes checks if a certain expression occurs in a string . So I want to check if the the expression “i-sty” occurs in my variable for each i between 1:200 and, if this is true, it should give the corresponding i.

For example if we have “4-sty” the loop should give me 4 and if there is no “i-sty” in the variable it should give me . for the observation.

I used

for (i in 1:200){
  datafram$height <- ifelse(grepl("i-sty", dataframe$Description), i, ".")
}

But it did not work. I literally only receive points. Attached I show a picture of the string variable. enter image description here

jogo
  • 12,469
  • 11
  • 37
  • 42
web_1920
  • 23
  • 2
  • 2
    "i-sty" is just a string with the letter `i` in it. To you use a regex pattern with your variable `i`, you need to paste together a string, e.g., `grepl(paste0(i, "-sty"), ...)`. I'd also recommend using `NA` rather than `".`" for the "else" result - that way the resulting `height` variable can be numeric. – Gregor Thomas Jun 01 '20 at 14:05
  • Welcome to [SO]! Please make your example reproducible, read [ask] and https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – jogo Jun 01 '20 at 14:19
  • `x <- c("6-sty xxx", "4-sty yyyy", NA, "sty zzz", "32-sty xyz"); as.numeric(sub("^([0-9]+)-sty.*", "\\1", x))` – jogo Jun 01 '20 at 17:40

1 Answers1

1

"i-sty" is just a string with the letter i in it. To you use a regex pattern with your variable i, you need to paste together a string, e.g., grepl(paste0(i, "-sty"), ...). I'd also recommend using NA rather than "." for the "else" result - that way the resulting height variable can be numeric.

for (i in 1:200){
  dataframe$height <- ifelse(grepl("i-sty", dataframe$Description), i, ".")
}

The above works syntactically, but not logically. You also have a problem that you are overwriting height each time through the loop - when i is 2, you erase the results from when i is 1, when i is 3, you erase the results from when i is 2... I think a better approach would be to extract the match, which is easy using stringr (but also possible in base). As a benefit, with the right pattern we can skip the loop entirely:

library(stringr)

dataframe$height = str_match(string = dataframe$Description, pattern = "[0-9]+-sty")[, 2]
# might want to wrap in `as.numeric`

You use both datafram and dataframe. I've assumed dataframe is correct.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294