2

I have a text that has sentences that may not have space after a dot like:

See also vadding.Constructions on this term abound.

How can I add a space after a dot that is not before the domain name? The text may have URLs like:

See also vadding.Constructions on this term abound. http://example.com/foo/bar

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
jcubic
  • 61,973
  • 54
  • 229
  • 402

4 Answers4

4

Match and capture an URL and just match all other dots to replace with a dot+space:

var re = /((?:https?|ftps?):\/\/\S+)|\.(?!\s)/g; 
var str = 'See also vadding.Constructions on this term abound.\nSee also vadding.Constructions on this term abound. http://example.com/foo/bar';
var result = str.replace(re, function(m, g1) {
 return g1 ? g1 : ". ";
});
document.body.innerHTML = "<pre>" + result + "</pre>";

The URL regex - (?:https?|ftps?):\/\/\S+ - matches http or https or ftp, ftps, then :// and 1+ non-whitespaces (\S+). It is one of the basic ones, you can use a more complex one that you can easily find on SO. E.g. see What is a good regular expression to match a URL?.

The approach in more detail:

The ((?:https?|ftps?):\/\/\S+)|\.(?!\s) regex has 2 alternatives: the URL matching part (described above), or (|) the dot matching part (\.(?!\s)).

NOTE that (?!\s) is a negative lookahead that allows matching a dot that is NOT followed with a whitespace.

When we run string.replace() we can specify an anonymous callback function as the second argument and pass the match and group arguments to it. So, here, we have 1 match value (m) and 1 capture group value g1 (the URL). If the URL was matched, g1 is not null. return g1 ? g1 : ". "; means we do not modify the group 1 if it was matched, and if it was not, we matched a standalone dot, thus, we replace with with . .

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I re-read the question and adjusted the answer to only add spaces after dots that are not followed with a space: `var re = /((?:https?|ftps?):\/\/\S+)|\.(?!\s)/g;`. Please review and let me know which option to keep. – Wiktor Stribiżew Apr 24 '16 at 17:32
  • This replace all dots by dot and space but not in url, I need to have only one space after dot. – jcubic Apr 24 '16 at 17:33
  • Do you want to say you need to only add a space after 1 dot that is immediately followed with a URL? Then see Shafizadeh's answer. – Wiktor Stribiżew Apr 24 '16 at 17:35
  • No, in your solution dot before the url have 2 spaces (because it already have a space) and it need to have one space only. – jcubic Apr 24 '16 at 17:40
  • @jcubic: My update (see the first comment) is exactly what you need. **I precised the answer**. – Wiktor Stribiżew Apr 24 '16 at 17:42
0

Use this pattern:

/\.(?! )((?:ftp|http)[^ ]+)?/g

Online Demo

Shafizadeh
  • 9,960
  • 12
  • 52
  • 89
0

You can try using RegExp /(\.)(?!=[a-z]{2}\/|[a-z]{3}\/|\s+|$)/g to match . character if not followed by two or three lowercase letters or space character

"See also vadding.Constructions on this term abound. http://example.com/foo/bar"
.replace(/(\.)(?!=[a-z]{2}\/|[a-z]{3}\/|\s+|$)/g, "$1 ")
guest271314
  • 1
  • 15
  • 104
  • 177
  • @WiktorStribiżew _"Why do you think it should work?"_ ? Not certain follow? – guest271314 Apr 24 '16 at 17:28
  • @WiktorStribiżew Given string at OP, should match only first `.` character. Added `$` to not match `.` if end of input; e.g., at first string at OP – guest271314 Apr 24 '16 at 17:29
  • @WiktorStribiżew Yes? `RegExp` at post should return expected results https://jsfiddle.net/b27yr1g1/ . Though currently tld's do exist tld's where `.` could be followed by more than two or three characters – guest271314 Apr 24 '16 at 17:35
  • 1
    Never you mind, the question is unclear. See, even OP decided to answer himself. – Wiktor Stribiżew Apr 24 '16 at 17:39
0

Using idea from @MarcelKohls

var text = "See also vadding.Constructions on this term abound. http://example.com/foo/bar";
var url_re = /(\bhttps?:\/\/(?:(?:(?!&[^;]+;)|(?=&amp;))[^\s"'<>\]\[)])+\b)/gi;
text = text.split(url_re).map(function(text) {
  if (text.match(url_re)) {
    return text;
  } else {
    return text.replace(/\.([^ ])/g, '. $1');
  }
}).join('');
document.body.innerHTML = '<pre>' + text + '</pre>';
jcubic
  • 61,973
  • 54
  • 229
  • 402
  • You can do that by pure regex `/\.(?! )((?:ftp|http)[^ ]+)?/g` !! *(without that callback and condition)* – Shafizadeh Apr 24 '16 at 17:40
  • My answer provides exactly the same behavior but is more compact. Please review it. I deleted the first regex I suggested and left only the version with a negative lookahead after a standalone dot matching alternative. – Wiktor Stribiżew Apr 24 '16 at 17:41