Why is this regex expression not matching the string exactly?

Question

I am trying to match 'http://' and 'https://' exactly so that I can remove them from URLS, although I'm having some trouble as it is also matching letters within the URL itself.

Why is this and how can I fix it?

Why? Because `[^`…`]` indicates a negated character class—every character that is neither `h`, `t`, `p`, `s`, `:`, `/` nor `$` is matched. Why not just match `/https:\/\//` and remove it? — Sebastian Simon, Aug 22 '19 at 02:23
Yes good point, what if I want to look for and replace both https:// and http:// ? — AKL012, Aug 22 '19 at 02:25
Use `/https?:\/\//`, do not cram the regex pattern with constructs you do not need. Study [character classes](https://www.regular-expressions.info/charclass.html) by all means. — Wiktor Stribiżew, Aug 22 '19 at 07:16

score 0 · Answer 1 · answered Aug 22 '19 at 02:24

0

The regex [^https://$] means:

Match any single character not present in the list "htps:/$"

answered Aug 22 '19 at 02:24

Andie2302

4,825
4
24
43

Code Maniac · Answer 2 · 2019-08-22T02:25:54.820

0

The regex you have means

 [^http://$]

Match anything except h,t,p,:,/,$

You can simply use URL api to get host name and if you want to replace only http or http you can use replace

let urls = ['http://example.com/123', 'https://examples.com', 'example.com']

// to get hostname
urls.forEach(url => {
  if (/^https?:\/\//i.test(url)) {
    let parsed = new URL(url)
    console.log(parsed.hostname)
  } else {
    console.log(url)
  }
})

// to remove http or https
urls.forEach(url => {
  let replaced = url.replace(/^https?:\/\//i, '')
  console.log(replaced)
})

edited Aug 22 '19 at 02:25

answered Aug 22 '19 at 02:24

Code Maniac

37,143
5
39
60

Yes good point, what if I want to look for and replace both https:// and http:// ? – AKL012 Aug 22 '19 at 02:25
1

@AKL012 the answer already covers both the cases – Code Maniac Aug 22 '19 at 02:26
Thanks, and how to include other characters like '()' and '-'? – AKL012 Aug 22 '19 at 02:33
@AKL012 what do you mean by that ? can you please explain ? where you want to allow those characters ? – Code Maniac Aug 22 '19 at 04:33

Nick Reed · Answer 3 · 2019-08-22T03:44:23.790

As others have answered, [^https://$] doesn't work because [^] isn't a capture group asserting start-of-line, it's a negated character class. Your regex matches any character that is not one of the letters h, t, p, s, :, / literally.

The [brackets] describe a character class, while the (parenthesis) describe a capture group - probably what you were looking for. You can learn more about them in this excellent answer.

It looks a bit like you were trying to use the ^ and $ symbols, but that's not a good idea for your particular regex. This would have asserted the start-of-line was before h, and the end-of-line was after /, meaning the regex would not match unless https:// was the only thing in the string.

If you'd like to match http:// and https://, this regex will do the trick: (https{0,1}:\/\/)

BREAKDOWN

(https{0,1}:\/\/)


(               )    capture this as a group
 http                match "http"
     s{0,1}          match 0 or 1 "s"
           :         match ":"
            \/\/     match "//" literally

Try it here!

If you'd like to match characters like () and -, you can do so by escaping them, too:

\(\)\-    matches "()-" literally

Good luck!

Why is this regex expression not matching the string exactly?

3 Answers3