I am trying to standardize a given string using a defined set of rules. These rules have been formalized using several "gsub" operations which are stored in a data frame (but is being called as an atomic vector using $) in plain-text.
I have 4 separate data-frames populated with the strings I want to standardize. I have implemented a for loop which works, however, it involves rewriting the gsub operations for each data frame and is quite time-consuming to run.
While I am aware that apply doesn't provide any real speedup over a for loop unless a compiled function is called, I am in need of an abstracted method to run this standardization over several data-frames (as there will be more in the future).
In order to achieve this generalization, I tried writing a nested apply structure. I am evaluating the gsub operations within the function call from apply using "eval(parse(text = x))". I want to iterate this apply call over the elements of the data frame with strings stored for standardization, hence the higher nested apply.
I am expecting the apply to loop over all operations and apply them sequentially to a string, all the while looping over the string data frame itself. However,this is clearly not working. It throws the output:
library(data.table)
library(stringi)
repdf <- data.table(Names = c("Palmolive Co. Pvt. Ltd.","Hellenic P. Co.","Freeman's Consortium pvt. ltd."),Address =c("15, Parkway Broadsite, Mumbai","Greco-Roman Architecture Street, Pune","1-B,Black Mesa Compound, Crowbar Street, Delhi."))
gsubop_df <- data.table(Commands = c('"stri_replace_all_regex(x, "Co\\b\\.?","Company")"','"stri_replace_all_regex(x, "\\(P\\.\\)$","Private Limited")"','"stri_replace_all_regex(x, "Corpn\\b\\.?","Corporation")"'))
repdf$Names <- apply(repdf[,1],2,function(x) apply(gsubop_df,2,eval(parse(text = as.character(x)))))
#> Error in parse(text = as.character(x)): <text>:1:11: unexpected symbol
#> 1: Palmolive Co.
#>
As I mentioned before, I wrote a for loop which works:
name_rule_length <- length(name_clean_rules_apply$Commands)
for(i in 1:nrow(mh_rules_nme)){
MG$Name <- eval(parse(text= mh_rules_nme[i,]))
}
An example of the gsub operation in mh_rules_nme:
stri_replace_all_regex(MG$Name,"M(?:\\|\\/)s","")
This, however, requires me to rewrite the gsub operation for every data frame, whereas I am looking to achieve the same function using a generic "x" from within apply.
However, when I do an atomic eval(parse), it runs fine. Within the looping operation, though, this error is thrown.
Any help in resolving this is much appreciated.