I would like to find the named functions I use frequently in my R scripts (ignoring operators such as "+" and "$" and "[). How to write an elegant and reliable regex that matches names of functions has stumped me. Here is a small example and my clumsy code so far. I welcome cleaner, more reliable, and more comprehensive code.
test1 <- "colnames(x) <- subset(df, max(y))"
test2 <- "sat <- as.factor(gsub('International', 'Int'l', sat))"
test3 <- "score <- ifelse(str_detect(as.character(sat), 'Eval'), 'Importance', 'Rating')"
test <- c(test1, test2, test3)
The test object includes eight functions (colnames, subset, max, as.factor, gsub, ifelse, str_detect, as.character), and the first two twice. Iteration one to match them is:
(result <- unlist(strsplit(x = test, split = "\\(")))
[1] "colnames" "x) <- subset"
[3] "df, max" "y)"
[5] "sat <- as.factor" "gsub"
[7] "'International', 'Int'l', sat)))" "score <- ifelse"
[9] "str_detect" "as.character"
[11] "sat), 'Eval'), 'Importance', 'Rating')"
Then, a series of hand-crafted gsubs cleans the result from this particular test set, but such manual steps will undoubtedly fall short on other, less contrived strings (I offer one below).
(result <- gsub(" <- ", " ", gsub(".*\\)", "", gsub(".*,", "", perl = TRUE, result))))
[1] "colnames" " subset" " max" "" "sat as.factor" "gsub" ""
[8] "score ifelse" "str_detect" "as.character"
The object, test4, below includes the functions lapply, function, setdiff, unlist, sapply, and union. It also has indenting so there is internal spacing. I have included it so that readers can try a harder situation.
test4 <- "contig2 <- lapply(states, function(state) {
setdiff(unlist(sapply(contig[[state]],
function(x) { contig[[x]]})), union(contig[[state]], state))"
(result <- unlist(strsplit(x = test4, split = "\\(")))
(result <- gsub(" <- ", " ", gsub(".*\\)", "", gsub(".*,", "", perl = TRUE, result))))
BTW, this SO question has to do with extracting entire functions to create a package. A better way to extract functions from an R script?
EDIT after first answer
test.R <- c(test1, test2, test3) # I assume this was your first step, to create test.R
save(test.R,file = "test.R") # saved so that getParseData() could read it
library(dplyr)
tmp <- getParseData(parse("test.R", keep.source=TRUE))
tmp %>% filter(token=="SYMBOL") # token variable had only "SYMBOL" and "expr" so I shortened "SYMBOL_FUNCTION_CALL"
line1 col1 line2 col2 id parent token terminal text
1 1 1 1 4 1 3 SYMBOL TRUE RDX2
2 2 1 2 1 6 8 SYMBOL TRUE X
Something happened with all the text. What should I have done?