1

Given a free text like

Hello World http://example.com/?param=FooBar Foo Bar

... how would one be able to convert everything to lower-case letters, except most common URL patterns (as they are the parts where case needs to be preserved)?

E.g. above would end up being

hello world http://example.com/?param=FooBar foo bar

I'm using JavaScript. Thanks!

Philipp Lenssen
  • 8,818
  • 13
  • 56
  • 77

3 Answers3

2

I'd do something like this:

var str = 'Hello World http://example.com/?param=FooBar Foo Bar';

str.replace(/[^\s]+/g, function (match) {
    return match.indexOf('http') === 0 ? match : match.toLowerCase();
})

http://jsfiddle.net/xMPFW/1/

Depending on how specific you want to be, you could include a more complex URL check inside the callback - something like this: https://stackoverflow.com/a/3809435/1200182

Community
  • 1
  • 1
Robert Messerle
  • 3,022
  • 14
  • 18
1

This should work:

var s = 'Hello World http://example.com/?param=FooBar FOO Bar';
var r = s.replace(/([^ A-Z]*)([A-Z])/g, function($1, $2) { 
     return /https?:\/\/\S+/.test($1) ? $1 : $1.toLowerCase(); 
});
//=> hello world http://example.com/?param=Foobar foo bar
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

Tokenize the string, then utilize the URI Regex from here to check if the token is a URI token or not, if it's not make it lower case.

var regex = new RegExp("([A-Za-z][A-Za-z0-9+\\-.]*):(?:(//)(?:((?:[A-Za-z0-9\\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})*)@)?((?:\\[(?:(?:(?:(?:[0-9A-Fa-f]{1,4}:){6}|::(?:[0-9A-Fa-f]{1,4}:){5}|(?:[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){4}|(?:(?:[0-9A-Fa-f]{1,4}:){0,1}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){3}|(?:(?:[0-9A-Fa-f]{1,4}:){0,2}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){2}|(?:(?:[0-9A-Fa-f]{1,4}:){0,3}[0-9A-Fa-f]{1,4})?::[0-9A-Fa-f]{1,4}:|(?:(?:[0-9A-Fa-f]{1,4}:){0,4}[0-9A-Fa-f]{1,4})?::)(?:[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(?:(?:[0-9A-Fa-f]{1,4}:){0,5}[0-9A-Fa-f]{1,4})?::[0-9A-Fa-f]{1,4}|(?:(?:[0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4})?::)|[Vv][0-9A-Fa-f]+\\.[A-Za-z0-9\\-._~!$&'()*+,;=:]+)\\]|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)|(?:[A-Za-z0-9\\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*))(?::([0-9]*))?((?:/(?:[A-Za-z0-9\\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*)|/((?:(?:[A-Za-z0-9\\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+(?:/(?:[A-Za-z0-9\\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*)?)|((?:[A-Za-z0-9\\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+(?:/(?:[A-Za-z0-9\\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*)|)(?:\\?((?:[A-Za-z0-9\\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})*))?(?:\\#((?:[A-Za-z0-9\\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})*))?")

var text = 'Hello World http://example.com/?param=FooBar Foo Bar';
var tokens = text.split(' ');
var lowerText = '';

for (var i = 0; i < tokens.length; i++) {
    if (i != 0)
        lowerText += ' ';
    if (!tokens[i].match(regex))
        lowerText += tokens[i].toLowerCase();
    else
        lowerText += tokens[i];
}
mpcabd
  • 1,813
  • 15
  • 20