1

I'm trying to rewrite a loop as an lapply statement, but I am getting stuck because I cannot figure out how to incorporate the index within the function. I recently asked a similar question on SO and received an elegant response, but the response doesn't generalize to this problem.

I'm working with a set of records, and the records are structured in a long format. I can identify each unique set of records by a unique string. The rows that I want to fix always occurs exactly two rows after these unique strings.

Here is the reproducible data:

text <- c("_____", "A: aaa", "bbb", "C: cccc", "D: dddd",
    "_____", "A: aaa:aaa", "bbb", "C: ccc", "D: dddd", "E: eeee",
    "_____", "A: aaa", "bbb:bbb", "C: ccc", "D: dddd")

And here is the loop that does what I need it to do. It works just fine on a very small data set, but I have to apply this logic in a few different ways to a few hundred thousand rows of data -- a more efficient method is definitely needed!

for(i in 3:length(text)){
    text[i] <- ifelse(grepl("\\_{5}", text[i-2]) == TRUE,
    paste("B: ", text[i], sep=""), text[i])
    text
    }

Of course, feel free to redirect if there are existing problems on SO that I did not identify. Thanks in advance.

Community
  • 1
  • 1
Brian P
  • 1,496
  • 4
  • 25
  • 38
  • 4
    I didn't look more into the problem. But you could get the expected result by `indx <- which(grepl('_{5}', text)); text[indx+2] <- paste0('B: ', text[indx+2])` – akrun Feb 17 '15 at 15:05
  • Ahhhh ... that makes sense! I certainly could do that. I'm still curious as to incorporating an index in `lapply` for learning purposes. – Brian P Feb 17 '15 at 15:07
  • 1
    If you want to do the same thing in `sapply` using the `indx`, perhaps `text[indx+2] <- sapply(indx, function(x) paste0('B: ', text[x+2]))` (which is kind of unnecessary) – akrun Feb 17 '15 at 15:10
  • @akrun Your first comment was perfect. Your second comment clarifies what I was trying to understand to better understand the apply functions. – Brian P Feb 17 '15 at 15:12
  • @Joshua Ulrich For somebody expert in R, the redirect to the duplicate may seem immediately generalizable. Not exactly sure how the post solves the question I asked. – Brian P Feb 17 '15 at 15:27
  • 2
    It shows you several different ways you can access the list index in the `lapply` call. Make the body of your for loop a function and call it from `lapply`. If you just want your code to be faster, put your for loop in a function, compile it (via the compiler package), and it will likely be faster than `lapply`. It will be faster yet if you change the entire for loop body to only `if(grepl("_____",text[i-2],fixed=TRUE)[1L]) text[i] <- paste0("B: ", text[i])`. – Joshua Ulrich Feb 17 '15 at 15:43

1 Answers1

2

To incorporate index into lapply, do this:

lapply(1:length(text), function(i) doStuff(text[i]) )
ahmohamed
  • 2,920
  • 20
  • 35
  • This just feels wrong. It might run ok but I don't think it's good practice to access objects in the global environment while inside a function. Here, the `doStuff` function is accessing the `text` variable directly from the workspace. So many nasty things can happen that way. – Faustin Gashakamba Sep 19 '22 at 18:14