3

Here is a minimal example of my problem:

http://jsfiddle.net/pm913emb/5/

var string = 'Question 6 of 7 '
+'Three, the patient suddenly develops shortness of breath and becomes hypotensive.    His heart rate is 100/min, with a normaI PR and    QRS intervaI.'

var sentencesMatch = string.match(/([\sa-zA-Z\d]){1}.+?[\.!\?]{1}([\s ]+|$)/g);

console.log(sentencesMatch);

As you can see, this string contains multiple sentences and there are two places where I have added multiple spaces: one is at the end of the sentence, the other in middle of the sentence. There is regex, which I run on this string.

The problem is: As you can see in the console, the matched results does not contain these multiple spaces.

What could be the reason of this problem. And possible solution?

Please help.. :/

Michel Floyd
  • 18,793
  • 4
  • 24
  • 39
R-J
  • 928
  • 1
  • 7
  • 24
  • I have added link to jsfiddle - http://jsfiddle.net/pm913emb/4/ – R-J Sep 06 '15 at 20:25
  • Just to clarify.. What output are you expecting to see? – Deftwun Sep 06 '15 at 20:28
  • There are two places in string where I added four spaces instead of one. I expect that these spaces eill be in a result match, but they are replaced by one space instead. – R-J Sep 06 '15 at 20:35
  • 3
    I think that is just the `console.log` not showing additional whitespaces. Try `alert` or `document.write` and the characters are there. – chris85 Sep 06 '15 at 20:52
  • No! I am using console.log just for a simple example. – R-J Sep 06 '15 at 20:55
  • Okay, what is the actual example? Browser only show one whitespace by default as well. – chris85 Sep 06 '15 at 20:56
  • How can we prevent this browser default? – R-J Sep 06 '15 at 21:01
  • You can convert the whitespace to entities, ` ` or ` `. http://stackoverflow.com/questions/24615355/browser-white-space-rendering/24615400#24615400 – chris85 Sep 06 '15 at 21:02
  • I removed my answer after mucking with it in [regexr](http://regexr.com/3bnsp) - it was working but returning two matches instead of one. – Michel Floyd Sep 06 '15 at 21:14

3 Answers3

2

Browsers don't show consecutive white-spaces. If you were to use entities they spaces would be displayed. So for example

<-- 2 spaces

would display as

<-- one space

in a browser.

If you used entities for the spaces

&#160;&#160;

you would get

(2 white-spaces (note even here it is one spaced).

Here's a longer write up on it.

Browser white space rendering

I think this accomplishes what you want (probably not the cleanest, I don't write JS often)..

<script type="text/javascript">
var string = 'Question 6 of 7 '
+'Three, the patient suddenly develops shortness of breath and becomes hypotensive.    His heart rate is 100/min, with a normaI PR and    QRS intervaI.'
var sentencesMatch = string.match(/([\sa-zA-Z\d]){1}.+?[\.!\?]{1}([\s ]+|$)/g);
var output = '';
for(var x= 0; x < sentencesMatch.length; x++){
    output += sentencesMatch[x].replace(/ /g, '&#160;');
}
document.write(output);
</script>
Community
  • 1
  • 1
chris85
  • 23,846
  • 7
  • 34
  • 51
  • But if I have other regex'es in my script. Do I have to look for spaces using $#160; ? – R-J Sep 06 '15 at 22:10
  • The 160 is an entity for space. Just use the replace when you are outputting to the browser. – chris85 Sep 06 '15 at 22:11
  • But then problem is that before outputting to broswer, this text is alreade wrapped with html elements. And space replacement affects tags too.. – R-J Sep 06 '15 at 22:30
  • 1
    There was no HTML in your example string. You shouldn't be using regexs on HTML. You provided a sample string; the issue with that string is that browsers only display one whitespace character (when consecutive). To resolve that you'd use this code to move the whitespace characters to their entities. – chris85 Sep 06 '15 at 22:39
1

Your code is working

Its just when you try and print the array itself, the browser trims the extra white space in the console. Try printing the individual array elements and (depending on your browser) you'll see that they do contain the extra spaces.

//You'll need to have the console open to see the results here

var string = 'Question 6 of 7 '
+'Three, the patient suddenly develops shortness of breath and becomes hypotensive.    His heart rate is 100/min, with a normaI PR and    QRS intervaI.'

var sentencesMatch = string.match(/([\sa-zA-Z\d]){1}.+?[\.!\?]{1}([\s ]+|$)/g);
console.log(sentencesMatch);

for (var i in sentencesMatch){
    //Add quotes so we can see trailing whitespace
    console.log('"' + sentencesMatch[i] + '"'); 
}

Extra white space is trimmed by default in HTML

If you want to actually put that string into an element then you will have the same issue. Here's how to fix it:

Use CSS

Probably the simplest solution. Style the elements using the white-space property

var string = 'Question 6 of 7 '
+'Three, the patient suddenly develops shortness of breath and becomes hypotensive.    His heart rate is 100/min, with a normaI PR and    QRS intervaI.'

var sentencesMatch = string.match(/([\sa-zA-Z\d]){1}.+?[\.!\?]{1}([\s ]+|$)/g);
for (var i in sentencesMatch){
  var p = document.createElement("p");
  document.body.appendChild(p);
  p.innerHTML = '"' + sentencesMatch[i] + '"';
  p.className = "keep-spaces";  
}
.keep-spaces{
  white-space: pre;
}

Or..Replace white space with a non-breaking-space

This solution replaces all whitespace characters with a 'non-breaking-space'. This is represented by the HTML entity &nbsp;, &#160;, or &xa0;.

var string = 'Question 6 of 7 '
    +'Three, the patient suddenly develops shortness of breath and becomes hypotensive.    His heart rate is 100/min, with a normaI PR and    QRS intervaI.'
var sentencesMatch = string.match(/([\sa-zA-Z\d]){1}.+?[\.!\?]{1}([\s ]+|$)/g);

for (var i in sentencesMatch){
  var p = document.createElement("p");
  document.body.appendChild(p);
  //Replace spaces with &nbsp; to preserve consecutive white space
  var str = sentencesMatch[i].replace(/\s/g,'&nbsp;');
  p.innerHTML = '"' + str + '"';
}
Deftwun
  • 1,162
  • 12
  • 22
-1

It's not the problem in your regex nor the string you have, If you tried putting a '\n'. you'd see it basically just replace it with one space, thus the problem is in you're browser. you might want to add a header like this to fix it:

content-type: text/html

or try base64-encode it and whenever you need it. decode it.