Commas not removed Javascript regex

Question

I'm trying to remove all non-alphanumeric characters from a string and then proceed to count the amount of words for each line extracted from a pdf.

var m = item["str"].replace(/[^a-zA-Z0-9 ]/g," ").trim().split(" ");
console.log("count: " + m.length + " words: " + m);

This is the code. An example of the resulting output:

count: 10 words: The,Quick,Brown,Fox,,,Jumps,Over,The,Lazy

While item["str"] looks like this:

The Quick Brown Fox - Jumps Over The Lazy

Some output also looks like this:

count:1 words:

Can anyone help me understand what's going on here? Thank in advance!

ug_ · Answer 1 · 2016-11-23T10:07:31.600

The problem is your regex matches a single character and replaces it with a space. This results in multiple spaces in a row in the final string.

Lets use your example:

The Quick Brown Fox - Jumps Over The Lazy

becomes

The Quick Brown Fox   Jumps Over The Lazy

Splitting that by spaces will result in a few empty strings.

You should split on multiple spaces in a row to remove them: split(/\s+/).

function runReplace(str) {
  var m = str.replace(/[^a-zA-Z0-9 ]/g," ").trim().split(/\s+/);
  document.write(str + "<br/>");
  document.write("count: " + m.length + " words: " + m + "<br/>");
}

runReplace("The Quick Brown Fox - Jumps Over The Lazy");

`/[^a-zA-Z0-9 ]+/g` doesn't work and `split("\s+")` isn't valid syntax. — Cerbrus, Nov 23 '16 at 10:05

score 1 · Answer 2 · edited Nov 23 '16 at 11:47

1

var item = {
    str: 'The Quick Brown Fox - Jumps Over The Lazy'
};

var output = item['str'].trim().replace(/\W/g, ' ').replace(/\s+/g, ' ').split(/\s/);

console.log('length', output.length);
console.log('output', output)

I've found 8 words instead of 10 :v

edited Nov 23 '16 at 11:47

Cerbrus

70,800
18
132
147

answered Nov 23 '16 at 10:10

Tân

1
15
56
102

I'd use `.replace(/\s+/g, ' ')` instead of `.replace(/\s{2}/g, '')`. Your code doesn't work properly for double spaces. – Cerbrus Nov 23 '16 at 10:55
@Cerbrus You're right! – Tân Nov 23 '16 at 11:09

prasanth · Answer 3 · 2017-05-18T05:01:55.140

0

You are almost done.Just do one thing remove the empty arguments in Array using Array#filter Method

var m = "The Quick Brown Fox - Jumps Over The Lazy".replace(/[^a-zA-Z0-9 ]/g," ").trim().split(" ").filter(a=> (a));
console.log("count: " + m.length + " words: " + m.join(","));

edited May 18 '17 at 05:01

answered Nov 23 '16 at 10:02

prasanth

22,145
4
29
53

Commas not removed Javascript regex

3 Answers3