0

I am using a for loop for finding and replacing some text values in a dataframe. My findreplace table has 1431 patterns and my master DF contains 3.5 lac records. Now, I am looking for some faster approach in order to reduce the runtime of the loop and get the work done faster. Currently it is taking 33 min.

dfy<-data.frame(fuzzyname=c("AU HOUSING","BAJAJ AUTO","INDOSTAR CAPITAL","FULLLERTON INDIA","LIC HOUSING FINANCE","CAPITALFIRST"))
dfy[]<-sapply(dfy, as.character)
df_pat<-data.frame(find=c("AUTO","CAPITAL","LIC"))
df_rep<-data.frame(replace=c("AUTOMOBILES","CAP","LIFE CORPORATION OF INDIA"))

for(i in 1:nrow(df_pat)) {
  dfy$fuzzyname <- gsub(df_pat$find[i],df_rep$replace[i],dfy$fuzzyname,perl=T)
  print(paste(i,df_pat$find[i],sep = "----"))
}

Please help me out.

1 Answers1

3

Use stringr::str_replace_all which is vectorized and doesn't require for loop.

dfy$fuzzyname <- stringr::str_replace_all(dfy$fuzzyname, 
                     setNames(df_rep$replace, df_pat$find))
dfy

#                                  fuzzyname
#1                                AU HOUSING
#2                         BAJAJ AUTOMOBILES
#3                              INDOSTAR CAP
#4                          FULLLERTON INDIA
#5 LIFE CORPORATION OF INDIA HOUSING FINANCE
#6                                  CAPFIRST
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • The `stringr` package seems very useful. Maybe I should learn it +1. – Tim Biegeleisen Feb 18 '21 at 05:59
  • I used the same but I also have some 'find' data with regex pattern as L delimiters and R Delimiters. I want to know, how to get the solution using which 'apply' function and how? – Piyush Sharma Feb 18 '21 at 06:02
  • Sir, Can I get an alternate approach by using 'apply' family as I have a number of loops for data updation, in R script, which takes too much time. Please help me out. – Piyush Sharma Feb 18 '21 at 07:11
  • I have already answered the question that you asked which does not require `for` loop nor any apply family of functions. It works as expected for the data that you have shared. If you have additional constraints it is not possible for us to know that if you don't mention it in your question while posting. – Ronak Shah Feb 18 '21 at 07:17