I need to find instances of a LaTeX \index command in a whole bunch of knitr documents (.Rnw) which have commas in them. These may occur over multiple lines e.g.
\index{prior distribution,choosing beta prior for
$\pi$,vague prior knowledge}
I'm reasonably happy with my R code to find things:
line = paste(readLines(input), collapse = "\n")
r = gregexpr(pattern, line)
if(length(r) > 0){
lapply(regmatches(line, r), function(e){cat(paste(substr(e, 0, 50), "\n"))})
}
However, I can't seem to get the regular expression right. I've tried
pattern = "(\\s)\\\\index\\{.*[,][^}]*\\}"
which gets some but not everything
pattern = "\\\\index\\{[A-Za-z \\s][^}]*\\}"
which gets more, but a lot I don't want. For example it finds
\index{posterior variance!beta distribution}
Any help appreciated.