2

I have a basic, case sensitive, term specific search with the code below. It will work for now but I would like something that (In order of importance):

1: ignores case (ie "hi" and "Hi" are both the same. toLowerCase is not an option and is not the same thing)

2: Will yield a hit if the search query is 'Search Term' and the searched string is 'searching terms', as an example.

3: Searches the entire string even after finding a hit for more hits.

The purpose is to search a <p> tag with a specific id for a term. If it has it then display it. Ultimately, I will use this in a loop that will search many <p> tags and display the ones with hits and leave hidden the ones without.

CODE:

<!DOCTYPE html>
<html>
    <body>
        <p id="demo">Click the button to locate where in the string a specifed value occurs.</p>
        <p id="demo1" style="display:none;">Hello world, welcome to the universe.</p>
        <button onclick="myFunction()">Try it</button>

        <script>
            function myFunction() {
                var x = document.getElementById("demo1")
                var str = x.innerHTML.toString();
                var n = str.indexOf("welcome");
                if (n != -1) {
                    x.style.display = 'inline';
                } else {
                    x.innerHTML = 'Negative';
                    x.style.display = 'inline';
                }
            }
        </script>

    </body>
</html>
Cerbrus
  • 70,800
  • 18
  • 132
  • 147
user1934286
  • 1,732
  • 3
  • 23
  • 42
  • 1
    This isn't a task for JavaScript. To do what you'd like, you will need to use natural language processing. I'd start with tokenizing your input string and removing the suffixes. From there, you can try to search your database. – Blender Dec 28 '12 at 11:00
  • I know some java. Would an applet be able to handle this? – user1934286 Dec 28 '12 at 11:02
  • This sort of stuff usually isn't done clientside. Java has a bunch of good natural language processing libraries that you could use, but they do have a steep learning curve. – Blender Dec 28 '12 at 11:03
  • "toLowerCase is not an option and is not the same thing" why not? – PeeHaa Dec 28 '12 at 11:05
  • toLowerCase changes the searh term which is unhelpful since the case of the searched string is unknown. toLowerCase is to normalize text. – user1934286 Dec 28 '12 at 11:13
  • @Blender do you have a link for these libraries and how to use them. – user1934286 Dec 28 '12 at 11:14
  • @fredsbend you'd need to use toLowerCase on both inputs, the indexed text and the query text. Please bear in mind that this is a *hard* problem, and each problem you solve or work around will be replaced by another one until your abstractions leak and you cry like a baby. If you know Java, have a look at the Lucene codebase for inspiration (there's a book too). – Richard Marr Dec 28 '12 at 11:16
  • @fredsbend It does indeed sort of normalize the text. And the beauty of it is of you do it on both the keyword(s) and on the search subject it does a case insensitive search :) – PeeHaa Dec 28 '12 at 11:35
  • @PeeHaa and Richard Marr Thanks. I guess I had a brain fart. Didn't think to use toLowerCase on both. That does solve the case sensitive problem. – user1934286 Dec 30 '12 at 19:55

3 Answers3

6

I'd start by tokenizing your input string:

function tokenize(input) {
    return input.toLowerCase().replace(/[^a-z0-9_\s]/g, '').split(/\s+/g)
}

Which does this to your search terms:

> tokenize("I'm your search string.")
["im", "your", "search", "string"]

Next, strip off the suffixes (I'm not even going to try to handle the cases where this won't work. This is what NLP is for):

function remove_suffix(token) {
    return token.replace(/(ing|s)$/, '');
}

It'll do this to each token:

> remove_suffix('searching')
"search"
> remove_suffix('terms')
"term"

So for each query string, you can construct a list of keywords:

function get_keywords(query) {
    var tokens = tokenize(query);
    var keywords = tokens.map(remove_suffix);
    keywords.sort();

    return keywords;
}

And it will convert your query into keywords:

> get_keywords('searching terms')
["search", "term"]
> get_keywords('term search')
["search", "term"]

Now, you just check to see if your query string's keywords are contained within the keywords of your search string.

This is a really simple example and won't handle the multitude of corner cases, but at least you see somewhat how you can use keywords for searching.

Blender
  • 289,723
  • 53
  • 439
  • 496
  • 1
    Just to give an example of what can go wrong: `remove_suffix('string')` --> `"str"`. Still, +1, since it's a good answer for the rest. – Cerbrus Dec 28 '12 at 11:49
  • @Cerbrus: And `'running' -> 'runn'`. The list of exceptions just goes on and on. – Blender Dec 28 '12 at 11:50
2

This, with some tweaking, should fulfill your requirements I believe. It might be better to do this in the backend though =).

// returns the indices of the found searchStr within str, case sensitive if needed
function getIndicesOf(searchStr, str, caseSensitive) {
    var startIndex = 0, searchStrLen = searchStr.length;
    var index, indices = [];
    if (!caseSensitive) {
        str = str.toLowerCase();
        searchStr = searchStr.toLowerCase();
    }
    while ((index = str.indexOf(searchStr, startIndex)) > -1) {
        indices.push(index);
        startIndex = index + searchStrLen;
    }
    return indices;
}

// this splits the search string in an array of search strings
var myStringArray = mySearchString.split("\\s+");
var result = true;
// loop over all the split search strings, and search each seperately
for (var i = 0; i < myStringArray.length; i++) {
    var indices = getIndicesOf(myStringArray[i], "I learned to play the Ukulele in Lebanon.", false);
    if(indices && indices.length>0){
        // do something with the indices of the found string
    } else {
        result = false;
    }
}
// result will be false here if one of the search terms was not found.

borrowed from here

Community
  • 1
  • 1
Steven
  • 1,365
  • 2
  • 13
  • 28
0

Take a look on Regular expressions engine. It take some time to learn but once you know it you'll probably achieve your goal here.

Here is a: link

Hope this helps

Mark Bramnik
  • 39,963
  • 4
  • 57
  • 97
  • So far I can see how to get case insensitivity. I will surely use that. Sort of see how I might solve item 2 in original post. Don't quite see how to solve item 3 in original post. – user1934286 Dec 28 '12 at 11:11
  • In short - you can use groups and global search (tag /g). For more answer you can read this: http://stackoverflow.com/questions/520611/how-can-i-match-multiple-occurrences-with-a-regex-in-javascript-similar-to-phps – Mark Bramnik Dec 28 '12 at 13:11