-1

I'm trying to remove all non-alphanumeric characters from a string and then proceed to count the amount of words for each line extracted from a pdf.

var m = item["str"].replace(/[^a-zA-Z0-9 ]/g," ").trim().split(" ");
console.log("count: " + m.length + " words: " + m);

This is the code. An example of the resulting output:

count: 10 words: The,Quick,Brown,Fox,,,Jumps,Over,The,Lazy

While item["str"] looks like this:

The Quick Brown Fox - Jumps Over The Lazy

Some output also looks like this:

count:1 words:

Can anyone help me understand what's going on here? Thank in advance!

Joe Harrison
  • 105
  • 1
  • 1
  • 8

3 Answers3

1

The problem is your regex matches a single character and replaces it with a space. This results in multiple spaces in a row in the final string.

Lets use your example:

The Quick Brown Fox - Jumps Over The Lazy

becomes

The Quick Brown Fox   Jumps Over The Lazy

Splitting that by spaces will result in a few empty strings.


You should split on multiple spaces in a row to remove them: split(/\s+/).

function runReplace(str) {
  var m = str.replace(/[^a-zA-Z0-9 ]/g," ").trim().split(/\s+/);
  document.write(str + "<br/>");
  document.write("count: " + m.length + " words: " + m + "<br/>");
}

runReplace("The Quick Brown Fox - Jumps Over The Lazy");
ug_
  • 11,267
  • 2
  • 35
  • 52
1

var item = {
    str: 'The Quick Brown Fox - Jumps Over The Lazy'
};

var output = item['str'].trim().replace(/\W/g, ' ').replace(/\s+/g, ' ').split(/\s/);

console.log('length', output.length);
console.log('output', output)

I've found 8 words instead of 10 :v

Cerbrus
  • 70,800
  • 18
  • 132
  • 147
Tân
  • 1
  • 15
  • 56
  • 102
0

You are almost done.Just do one thing remove the empty arguments in Array using Array#filter Method

var m = "The Quick Brown Fox - Jumps Over The Lazy".replace(/[^a-zA-Z0-9 ]/g," ").trim().split(" ").filter(a=> (a));
console.log("count: " + m.length + " words: " + m.join(","));
prasanth
  • 22,145
  • 4
  • 29
  • 53