Context first: I'm trying to highlight song titles in a Wikipedia page. First I get the quoted portions from the page, I check if they exist in a database of song titles, and then I highlight those that I find. The database part is surprisingly fast, and so is extracting the song titles (they are quoted).
Therefore (I think) I need to replace a set of words (the titles) in the HTML and wrap them in a span like this (for every word):
s/word/<span class="something">word<\/span>/gi
The text is about 100k long, and the list is about 300 words (neither pre-determined), so an iterative process replacing one word at the time is too slow (I need to keep this < 1 sec if possible).
So I've done
my $re = join '|', map { quotemeta($_) } @words;
$dom =~ s/($re)/<span class="something">$1<\/span>/gi;
which seems to work and is fast (0.64 on my benchmark case).
Now I want to replace \"$word\"
instead of just $word
so I tried this:
my $re = join '|', map { quotemeta(join '', '"', $_, '"') } @words;
and speed dropped by a factor of 10. Comparing speed with NYTProf all the difference seems to be inside CORE:substcont
Why is that?
(Extra thanks for suggestions on how to avoid replacing text inside tags such as id="word_to_be_replaced"
)