Regex for replacing everything other than these matches?

Question

Let's say I have the following string:

https://87230a61de33450b964afbc0814884ec@p2238591.ingest.sentry.io/1504772008177343

*this is a dummy string, but chosen to replicate as accurately as possible the problem.

I have constructed the following RegEx:

/[^(\bhttps:\/\/\b)]|[^(\b\.ingest.sentry.io\/\b)]/g

With the idea that anything that is not the specific word https:// or .ingest.sentry.io/ is matched and removed.

However, my negation doesn't seem to be working:

What have I done wrong here?

I have also tried these variations:

[^(\bhttps:\/\/\b)|(\b\.ingest.sentry.io\/\b)]

[^(https:\/\/)]|[^(\.ingest.sentry.io\/)]

But to no luck ... just what am I doing wrong here?

It looks like a hexidecimal string; why not just make a regex to match that part of the URL and grab it? Unless I'm misreading, what's the eventual purpose for this regex? Stripping out everything except constant text will leave you with constant text. — Rogue, Apr 27 '23 at 16:23
That's not how `[^...]` works. `[^...]` is a character class, it matches a single character that isn't in the contents, not a string. — Barmar, Apr 27 '23 at 16:23
Ok so what should I use in it's place? I want to replace everything that isn't the "facets" of the DSN with a * to obfuscate it ... — Micheal J. Roberts, Apr 27 '23 at 16:26
@Barmar I think you're incorrect, [^(https://)] would match everything other than "https://" — Micheal J. Roberts, Apr 27 '23 at 16:28
What do you mean, "think"? This is not a debate on belief, regular expressions are thoroughly documented and character classes (and character class negation) works exactly as it's documented they work. — Blindy, Apr 27 '23 at 16:31
@MichealJ.Roberts rather, it will match _everything_ that doesn't contain a single character of `https://`. So it will just start matching individual characters in the string, rather than full stretches. — Rogue, Apr 27 '23 at 16:31
To exclude a whole string you need to use a negative lookahead or lookbehind. — Barmar, Apr 27 '23 at 16:33
In PCRE you can [replace `\b(https:\/\/|\.ingest\.sentry\.io\/)\b(*SKIP)(*F)|.` with `*`](https://regex101.com/r/BXELct/1) — bobble bubble, Apr 27 '23 at 18:05

bobble bubble · Answer 1 · 2023-04-28T11:59:43.323

In PCRE there are verbs (*SKIP)(*F) available to skip something while matching.
Generally it works like this: stuff to be skipped (*SKIP)(*FAIL) | stuff to be matched

(\bhttps:\/\/|\.ingest\.sentry\.io\/)(*SKIP)(*F)|.

See this demo at regex101 (was unsure about how you placed \b word boundaries)

In Python this works with PyPI regex (demo) or use a capture group and callback:

regex = r"(\bhttps:\/\/|\.ingest\.sentry\.io\/)|."
res = re.sub(regex, lambda m: m.group(1) if m.group(1) else "*", s)

Python demo at tio.run or regex101 demo (using normal replace it will add extra *)

In Javascript using a callback and the same pattern with a capturing group:

let s = 'https://87230a61de33450b964afbc0814884ec@p2238591.ingest.sentry.io/15047....'

let res = s.replace(/(\bhttps:\/\/|\.ingest\.sentry\.io\/)|./g, (m0,m1) => m1?m1:'*');
console.log(res)

score 0 · Answer 2 · answered Apr 27 '23 at 16:29

As you were told, that is not how you "negate" regular expressions. In fact, there is no negation of regular expressions in general, you have to write one that matches what you want to actually match. For example, it sounds like you want something like this:

^(?:https:\/\/)?(.*?)(?:\.ingest\.sentry\.io)(.*)$

See it in action here: https://regex101.com/r/tUjZjT/1

Regex for replacing everything other than these matches?

2 Answers2