0

Please have a look at the following code

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>

<script>
function count()
{
    var listOfWords, paragraph, listOfWordsArray, paragraphArray;
    var wordCounter=0;

    listOfWords = document.getElementById("wordsList").value;

    //Split the words
    listOfWordsArray = listOfWords.split("\n");

    //Convert the entire word list to upper case
    for(var i=0;i<listOfWordsArray.length;i++)
    {
        listOfWordsArray[i] = listOfWordsArray[i].toUpperCase();
    }

    //Get the paragrah text
    paragraph = document.getElementById("paragraph").value;
    paragraphArray = paragraph.split(" ");

    //Convert the entire paragraph to upper case
    for(var i=0; i<paragraphArray.length; i++)
    {
        paragraphArray[i] = paragraphArray[i].toUpperCase();
    }

    //check whether paragraph contains words in list
    for(var i=0; i<listOfWordsArray.length; i++)
    {
    /*  if(paragraph.contains(listOfWords[i]))
        {
                wordCounter++;
        }*/

        re = new RegExp("\\b"+listOfWordsArray[i]+"\\b");

        if(paragraph.match(re))
        {
            wordCounter++;
        }
    }

    window.alert("Number of Contains: "+wordCounter);
}
</script>

</head>


<body>
<center>
<p> Enter your Word List here </p>
<br />
<textarea id="wordsList" cols="100" rows="10"></textarea>

<br />
<p>Enter your paragraph here</p>
<textarea id="paragraph" cols="100" rows="15"></textarea>

<br />
<br />
<button id="btn1"  onclick="count()">Calculate Percentage</button>

</center>
</body>
</html>

Here, what I am trying to do is counting how any number of words are in paragraph which are also included in wordList. words in wordList are separated by new line.

However I need this check to be case insensitive. for an example, there should be no difference between 'count' , 'COUNT' and 'Count'.

But here, I am always getting the answer 0. What am I doing wrong here?

Update

I tried the following function, provided by SO User 'Kolink'. However it is giving different answers in different runs. In first few runs it was correct, then it starts to provide wrong answers! Maybe JavaScript as static variables?

halfer
  • 19,824
  • 17
  • 99
  • 186
PeakGen
  • 21,894
  • 86
  • 261
  • 463
  • 1
    You should uppercase the value before splitting it. This will allow you to remove the first two loops. –  Sep 04 '13 at 15:11
  • Do you want to count how many words from the list appear in the paragraph, or how many _times_ the words appear? – Evan Davis Sep 04 '13 at 15:17
  • @Mathletics: This should analyze the paragraph and tell you what percentage of the words in the paragraph are contained within the word list. – PeakGen Sep 04 '13 at 16:42

4 Answers4

2

You are preparing the paragraph's words in paragraphArray but then you never use it.

I would suggest something like this:

var words = document.getElementById('wordsList').value.split(/\r?\n/),
    l = words.length, i, total = 0, para = document.getElementById('paragraph').value;
for( i=0; i<l; i++) if( para.match(new RegExp("\\b"+words[i]+"\\b","i"))) total++;
alert("Total: "+total);
Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
  • Thanks for the reply. I updated my question, please have a look – PeakGen Sep 04 '13 at 17:04
  • Yes, this is the answer. It seems like the web browser maintained a cache copy, or get confused when the same file is open in number of tabs – PeakGen Sep 04 '13 at 19:30
1

Solution

How about just this:

var wc = function (text, wordsToMatch) {
  var re = new RegExp("(" + (wordsToMatch || ["\\w+"]).join('|') + ")", "gi");
  var matches = (text || "").match(re);

  // console.log(matches);
  return (matches ? matches.length : 0);
};

Or for an unreadable version (not recommended):

var wc = function (t, w) {
  return (((t || "").match(new RegExp("(" + (w || ["\\w+"]).join('|') + ")", "gi")) || []).length);
};

Integration

So, in your code, you'd be able to throw away most of it and write:

function count()
{
    var wordsList   = document.getElementById("wordsList").value;
    var paragraph   = document.getElementById("paragraph").value;
    var wordCounter = wc(paragraph, wordsList.split("\n"));

    window.alert("Number of Contains: " + wordCounter);
}

Examples

Example 1 (matches against a list)

Input:

console.log(wc("helloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworld", ["world"]));
console.log(wc("helloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworld", ["hello", "world"]));

Output:

12
24

Example 2 (safe defaults)

Input:

console.log(wc("", ["hello", "world"]));
console.log(wc());
console.log(wc(""));

Output:

0
0
0

Example 3 (as a default word counter)

Input:

console.log(wc("hello"));
console.log(wc("hello world"));

Output:

1
2
haylem
  • 22,460
  • 3
  • 67
  • 96
0

You could search with no regexp (link to eliminateDuplicates) :

var wordCounter = 0;

// retrieves arrays from textareas

var list = eliminateDuplicates(
    document.getElementById('wordsList').value
    .toUpperCase()
    .split(/\s+/g)
);
var para = eliminateDuplicates(
    document.getElementById('paragraph').value
    .toUpperCase()
    .split(/\s+/g)
);

// performs search

for (var i1 = 0, l1 = para.length; i1 < l1; i1++) {
    var word = para[i1];
    for (var i2 = 0, l2 = list.length; i2 < l2; i2++) {
        if (list[i2] === word) {
            wordCounter++;
            break;
        }
    }
}
Community
  • 1
  • 1
-2

your regex is not well formatted. try

re = new RegExp("\\b"+listOfWordsArray[i]+"\b\");

cause the first caracter is \ , so the last should be \ , and not b

Someoneinthe
  • 372
  • 1
  • 9