Is there a way to vectorize this operation using xapply in R

Question

I have a vector

a <- c("there and", "walk and", "and see", "go there", "was i", "and see", 
"i walk", "to go", "to was")

and a data frame bg where

bg <- data.frame(term=c("there and", "walk and", "and see", "go there", "was i", "and see",
"i walk", "to go", "to was"), freq=c(1,1,2,1,1,2,1,1,1))

I need to create a vectorized version for the following code using either sapply,tapply, or vapply or apply etc

 d <- NULL
 for(i in 1:length(a)){
     temp <- filter(bg,term==a[i])
     d <- rbind(d,temp)
 }

The need is search the bg data when term==a[i] and create a data frame d

I need a vector version as for loops are excruciatingly slow in R.

Here is the sample data

> bg
       term freq
1 there and    1
2  walk and    1
3   and see    2
4  go there    1
5     was i    1
6   and see    2
7    i walk    1
8     to go    1
9    to was    1

and

>d
       term freq
1 there and    1
2  walk and    1
3   and see    2
4   and see    2
5  go there    1
6     was i    1
7   and see    2
8   and see    2
9    i walk    1
10    to go    1
11   to was    1

Thanks

That for loop is excruciatingly slow because you are building the structure inside the loop instead of allocating the memory for the vector beforehand and then binding the vectors after the loop has ended. Please show what you want the desired result to look like — Rich Scriven, Aug 25 '15 at 04:17
Your initial statement about `for` loops is not totally true: http://stackoverflow.com/a/7142982/3710546 — , Aug 25 '15 at 04:19
@RichardScriven yes I am using dplyr filter as seen above. dplyr::filter is fast but the for loop is murder. My data frame has 300K rows and the computation is taking 'for'ever. — Tinniam V. Ganesh, Aug 25 '15 at 04:44
@Pascal I managed to vectoriize other versions and the performance improvement is almost logarithmic, I think. — Tinniam V. Ganesh, Aug 25 '15 at 04:48
@latemail - Looks good. May need to massage the output.Let me check. Will get back to you later today. — Tinniam V. Ganesh, Aug 25 '15 at 04:58
@TinniamV.Ganesh - maybe just `merge(data.frame(term=a), bg, by="term", sort=FALSE)` going by your updated data. — thelatemail, Aug 25 '15 at 05:07
Or using the devel version of `data.table` `data.table(term=a)[bg, on='term']` — akrun, Aug 25 '15 at 05:09
@akrun - how new does data.table have to be to use that code? No go over here on 1.9.4 — thelatemail, Aug 25 '15 at 05:19
@thelatemail I meant the `1.9.5`. For `1.9.4`, we have to set the key, instead of the `on` — akrun, Aug 25 '15 at 05:20

thelatemail · Accepted Answer · 2019-07-01T05:37:44.750

This essentially becomes a merge operation, with a little twist to make sure that the row order follows the order in a:

out <- merge(bg, list(term=a, sortid=seq_along(a)), by="term")
out[order(out$sortid),]

#        term freq sortid
#7  there and    1      1
#10  walk and    1      2
#1    and see    2      3
#3    and see    2      3
#5   go there    1      4
#11     was i    1      5
#2    and see    2      6
#4    and see    2      6
#6     i walk    1      7
#8      to go    1      8
#9     to was    1      9

Or in data.table 1.9.5, with a nod to @akrun:

library(data.table)
out <- data.table(term=a, sortid=seq_along(a))[setDT(bg), on='term']
out[order(out$sortid)]

Or in dplyr:

left_join(data.frame(term=a), bg)

Or: `setDT(bg)[.(term=a), on="term"]` – Arun Aug 25 '15 at 19:18 — Arun, Aug 25 '15 at 19:18

Is there a way to vectorize this operation using xapply in R

1 Answers1