0

Ok, so this is what I have (special thx to Tushar Gupta, for fixing the code)

HTML

<input type='checkbox' value='2' name='v'>STS
<input type='checkbox' value='4' name='v'>NTV

js

$(function () {
var wordCounts = {};
$("input[type='text']:not(:disabled)").keyup(function () {
    var matches = this.value.match(/\b/g);
    wordCounts[this.id] = matches ? matches.length / 2 : 0;
    var finalCount = 0;
    var x = 0;
    $('input:checkbox:checked').each(function () {
        x += parseInt(this.value);
    });
    x = (x == 0) ? 1 : x;
    $.each(wordCounts, function (k, v) {
        finalCount += v * x;
    });
    $('#finalcount').val(finalCount)
}).keyup();
$('input:checkbox').change(function () {
    $('input[type="text"]:not(:disabled)').trigger('keyup');
});
});

I want it to be able to count up Russian words e.g "Привет как дела", so far it only works with English input

Konata
  • 275
  • 1
  • 3
  • 14
  • have you checked the encoding of the file where your js script is running ? from [this](http://stackoverflow.com/questions/553463/jquery-ajax-character-encoding-problem) page : "UTF-8 is supposed to handle all accents and foreign chars" ... also take a look at [this](http://stackoverflow.com/questions/10396913/how-to-show-russian-text-in-jquery-dialog-title) -- hope this helps – lollo Aug 31 '13 at 19:15
  • The links did not help, and ive tried the general encoding, among with UTF-8 i also tried the Cyrillic specifics like windows1251 – Konata Aug 31 '13 at 19:26
  • which system are you running for your project? try to put this meta tag in the head section of your html file : – lollo Aug 31 '13 at 19:44

2 Answers2

1

The \b notation is defined in terms of “word boundaries”, but with “word” meaning a sequence of ASCII letters, so it cannot be used for Russian texts. A simple approach is to count sequences of Cyrillic letters, and the range from U+0400 to U+0481 covers the Cyrillic letters used in Russian.

var matches = this.value.match(/\b/g);
wordCounts[this.id] = matches ? matches.length / 2 : 0;

by the lines

var matches = this.value.match(/[\u0400-\u0481]+/g);
wordCounts[this.id] = matches ? matches.length : 0;

You should perhaps treat a hyphen as corresponding to a letter (and therefore add \- inside the brackets), so that a hyphenated compound would be counted as one word, but this is debatable (is e.g. “жили-были” two words or one?)

Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390
0

The problem is in your regex - \b doesn't match UTF-8 word boundaries.

Try changing this:

    var matches = this.value.match(/\b/g);

To this:

    var matches = this.value.match(/[^\s\.\!\?]+/g);

and see if that gives a result for Cyrillic input. If it works then you no longer need to divide by 2 to get the word count.

bobs12
  • 154
  • 6
  • 1
    The code works in most cases but would report e.g. “Да – или нет?” as 4 words. – Jukka K. Korpela Aug 31 '13 at 20:47
  • @JukkaK.Korpela +1 because yes, it's a pretty basic example and would need some refinement according to the specific task. Worth noting though that `\b` will also count numbers as 'words' - not always useful, e.g. in calculating translation texts. Adding `\-` to my regex would cover your example, but on the whole it would be better to write an expression that looks for positive matches. If the task only requires Cryillic word counting then `[а-яА-Я0-9]` could be used, but it wouldn't match other UTF-8 alphabets. – bobs12 Sep 01 '13 at 06:55
  • 1
    my answer is an attempt at looking for positive matches. Note that `[а-яА-Я0-9]` would not match letters Ё and ё. – Jukka K. Korpela Sep 01 '13 at 16:26
  • @JukkaK.Korpela - Ё-моё! Kiitos, I didn't know that. Learn something new every day :) So I'll make that [а-яА-ЯёЁ0-9] then and save it for later. – bobs12 Sep 01 '13 at 18:45