0

I need to find regex to find website names which does not follow http:// or https:// eg

http://www.google.co.in  ---dont match
http://www.google.co.in  ---dont match
www.google.co.in         ---match

the URL can also be part of a larger string like

<p><a href="https://www.w3schools.com/html/">www.w3schools.com</a></p>

or

The URL To be Matched is www.w3schools.com and www.abc.com , URL Not to be matched is https://www.w3schools.com/html/

in which www.w3schools.com and www.abc.com (In the second example) shoud get a match, and there can be multiple urls in the string

thanks in advance

biff
  • 79
  • 1
  • 2
  • 13
  • Don't parse HTML with regex https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – mleko Apr 12 '18 at 06:59
  • String Is Not html, just gave it as an example, Updated the question accordingly, thanks – biff Apr 12 '18 at 07:01

3 Answers3

1

Do you need that?

/(?<!https:\/\/)(?<!http:\/\/)(www\.[\w-.]*?[\w-]+?(\/[\w-]*?)*?)((?=[^\w.\/-]+?)|$)+/ig

You can have a look here:

https://regex101.com/r/XvmR4V/4

If you have a large String that contains website names, this regex matches all names, that do not start with "http://" or "https://". Your websites names always have to start with "www"!!!

Without lookaheads and lookbehinds you can try this. You are going to find the results in the 2. Group ($2).

/([^\/]{2,2})(www\.[\w-.]*?[\w-]+?(\/[\w-]*?)*?)(([^\w.\/-]+?)|$)+/ig

https://regex101.com/r/XvmR4V/5

Now even for www.google.de:

([^\/]{2,2}|^)(www\.[\w-.]*?[\w-]+?(\/[\w-]*?)*?)(([^\w.\/-]+?)|$)+

https://regex101.com/r/XvmR4V/6

You can replace like that.

I replaced the 'www...' with 'Test'.

/([^\/]{2,2}|^)(www\.[\w-.]*?[\w-]+?(\/[\w-]*?)*?)(([^\w.\/-]+?)|$)+/$1Test$4/gi

I testet it with the regex-Tool from IntelliJ.

My input was:

<p><a href="https://www.w3schools.com/html/"><a href="http://www.w3schools.com/html/">www.w3schools.com</a></p>
<p><a href="https://www.google.com/html/"><a href="http://www.google.com/html/">www.google.com</a>

The output was:

<p><a href="https://www.w3schools.com/html/"><a href="http://www.w3schools.com/html/">Test</a></p>
<p><a href="https://www.google.com/html/"><a href="http://www.google.com/html/">Test</a>

If it helps, it would be greate if you vote it up :-)

D. Braun
  • 508
  • 1
  • 4
  • 11
  • in the above link the regex does not work in javascript, while running im getting an error ERROR SyntaxError: Invalid regular expression: /(?^!https://)(?<!http://)(www.[w-.]*?[w-]+?(/[w-]*?)*?)((?=[^w./-]+?)|$)+/gi/: Invalid group at new RegExp () – biff Apr 12 '18 at 08:37
  • @biff: added an other possibility to check it. – D. Braun Apr 12 '18 at 09:42
  • hi it was helpful, but if the string is just "www.google.com" rejex will not catch it – biff Apr 17 '18 at 06:17
  • and it also catches strings like "www..google.com" which is wrong, can u help – biff Apr 17 '18 at 06:23
  • @biff: adapted it :-) – D. Braun Apr 17 '18 at 11:09
  • my requirement is that i need to replace the matched string i.e "www.google.com" , but in this case while replacing the full match gets replaced i.e ">www.google.com<". The > &< symbols need not be replaced, any insight on that – biff Apr 18 '18 at 04:36
  • I have got an idea for that. I write you later. – D. Braun Apr 18 '18 at 06:55
0

If you just want to exclude strings beginning with http:// or https://, this is easy enough to do with a negative lookahead:

var match = "www.google.co.in";
var nomatch = "http://www.google.co.in";

var re = new RegExp("^(?!https?:\/\/).*$");
if (re.test(match)) {
    console.log(match + " is valid");
}
if (re.test(nomatch)) {
    console.log(nomatch + " is valid");
}

One advantage of this type of pattern is that it would allow to also filter the positive match URLs on other conditions.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0

You can use the regular expression ^(http|https):// to get the match for the string that has http:// or https://. Then when you apply the match use the not (!) operator to reverse the match to not to include http:// or https://:

var regEx = new RegExp("^(http|https)://", "i");
var str = "http://www.google.co.in";
var match = !regEx.test(str);
console.log(match + ' for ' + str);

str = 'http://www.google.co.in';
match = !regEx.test(str);
console.log(match + ' for ' + str);

str = 'www.google.co.in';
match = !regEx.test(str);
console.log(match + ' for ' + str);
Ankit Agarwal
  • 30,378
  • 5
  • 37
  • 62