I'm retrieving tweets from Twitter with the Twitter API and displaying them in my own client.
However, I'm having some difficulty properly highlighting the right search terms. I want to an effect like the following:
The way I'm trying to do this in JS is with a function called highlightSearchTerms(), which takes the text of the tweet and an array of keywords to bold as arguments. It returns the text of the fixed tweet. I'm bolding keywords by wrapping them in a that has the class .search-term.
I'm having a lot of problems, which include:
- Running a simple replace doesn't preserve case
- There is a lot of conflict with the keyword being in href tags
- If I try to do a for loop with a replace, I don't know how to only modify search terms that aren't in an href, and that I haven't already wrapped with the span above
An example tweet I want to be able to handle for:
Input:
This is a keyword. This is a <a href="http://search.twitter.com/q=%23keyword">
#keyword</a> with a hashtag. This is a link with kEyWoRd:
<a href="http://thiskeyword.com">http://thiskeyword.com</a>.
Expected Output:
This is a
<span class="search-term">keyword</span>
. This is a <a href="http://search.twitter.com/q=%23keyword"> #
<span class="search-term">keyword</span>
</a> with a hashtag. This is a link with
<span class="search-term">kEyWoRd</span>
:<a href="http://thiskeyword.com">http://this
<span class="search-term>keyword.com</span>
</a>.
I've tried many things, but unfortunately I can't quite find out the right way to tackle the problem. Any advice at all would be greatly appreciated.
Here is my code that works for some cases but ultimately doesn't do what I want. It fails to handle for when the keyword is in the later half of the link (e.g. http://twitter.com/this_keyword). Sometimes it strangely also highlights 2 characters before a keyword as well. I doubt the best solution would resemble my code too much.
function _highlightSearchTerms(text, keywords){
for (var i=0;i<keywords.length;i++) {
// create regex to find all instances of the keyword, catch the links that potentially come before so we can filter them out in the next step
var searchString = new RegExp("[http://twitter.com/||q=%23]*"+keywords[i], "ig");
// create an array of all the matched keyword terms in the tweet, we can't simply run a replace all as we need them to retain their initial case
var keywordOccurencesInitial = text.match(searchString);
// create an array of the keyword occurences we want to actually use, I'm sure there's a better way to create this array but rather than try to optimize, I just worked with code I know should work because my problem isn't centered around this block
var keywordOccurences = [];
if (keywordOccurencesInitial != null) {
for(var i3=0;i3<keywordOccurencesInitial.length;i3++){
if (keywordOccurencesInitial[i3].indexOf("http://twitter.com/") > -1 || keywordOccurencesInitial[i3].indexOf("q=%23") > -1)
continue;
else
keywordOccurences.push(keywordOccurencesInitial[i3]);
}
}
// replace our matches with search term
// the regex should ensure to NOT catch terms we've already wrapped in the span
// i took the negative lookbehind workaround from http://stackoverflow.com/a/642746/1610101
if (keywordOccurences != null) {
for(var i2=0;i2<keywordOccurences.length;i2++){
var searchString2 = new RegExp("(q=%23||http://twitter.com/||<span class='search-term'>)?"+keywordOccurences[i2].trim(), "g"); // don't replace what we've alrdy replaced
text = text.replace(searchString2,
function($0,$1){
return $1?$0:"<span class='search-term'>"+keywordOccurences[i2].trim()+"</span>";
});
}
}
return text;
}