0

Let's say I have the following string:

https://87230a61de33450b964afbc0814884ec@p2238591.ingest.sentry.io/1504772008177343

*this is a dummy string, but chosen to replicate as accurately as possible the problem.

I have constructed the following RegEx:

/[^(\bhttps:\/\/\b)]|[^(\b\.ingest.sentry.io\/\b)]/g

With the idea that anything that is not the specific word https:// or .ingest.sentry.io/ is matched and removed.

However, my negation doesn't seem to be working:

enter image description here

What have I done wrong here?

I have also tried these variations:

[^(\bhttps:\/\/\b)|(\b\.ingest.sentry.io\/\b)]
[^(https:\/\/)]|[^(\.ingest.sentry.io\/)]

But to no luck ... just what am I doing wrong here?

Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
Micheal J. Roberts
  • 3,735
  • 4
  • 37
  • 76
  • It looks like a hexidecimal string; why not just make a regex to match that part of the URL and grab it? Unless I'm misreading, what's the eventual purpose for this regex? Stripping out everything except constant text will leave you with constant text. – Rogue Apr 27 '23 at 16:23
  • That's not how `[^...]` works. `[^...]` is a character class, it matches a single character that isn't in the contents, not a string. – Barmar Apr 27 '23 at 16:23
  • Ok so what should I use in it's place? I want to replace everything that isn't the "facets" of the DSN with a * to obfuscate it ... – Micheal J. Roberts Apr 27 '23 at 16:26
  • @trincot Potential solution has not been found. – Micheal J. Roberts Apr 27 '23 at 16:27
  • Oh I see you deleted your comment. – trincot Apr 27 '23 at 16:27
  • @Barmar I think you're incorrect, [^(https://)] would match everything other than "https://" – Micheal J. Roberts Apr 27 '23 at 16:28
  • What do you mean, "think"? This is not a debate on belief, regular expressions are thoroughly documented and character classes (and character class negation) works exactly as it's documented they work. – Blindy Apr 27 '23 at 16:31
  • @MichealJ.Roberts rather, it will match _everything_ that doesn't contain a single character of `https://`. So it will just start matching individual characters in the string, rather than full stretches. – Rogue Apr 27 '23 at 16:31
  • To exclude a whole string you need to use a negative lookahead or lookbehind. – Barmar Apr 27 '23 at 16:33
  • @Blindy Please relax. I appreciate your contributions. – Micheal J. Roberts Apr 27 '23 at 16:37
  • 1
    In PCRE you can [replace `\b(https:\/\/|\.ingest\.sentry\.io\/)\b(*SKIP)(*F)|.` with `*`](https://regex101.com/r/BXELct/1) – bobble bubble Apr 27 '23 at 18:05

2 Answers2

2

In PCRE there are verbs (*SKIP)(*F) available to skip something while matching.
Generally it works like this:   stuff to be skipped (*SKIP)(*FAIL) | stuff to be matched

(\bhttps:\/\/|\.ingest\.sentry\.io\/)(*SKIP)(*F)|.

See this demo at regex101 (was unsure about how you placed \b word boundaries)


In Python this works with PyPI regex (demo) or use a capture group and callback:

regex = r"(\bhttps:\/\/|\.ingest\.sentry\.io\/)|."
res = re.sub(regex, lambda m: m.group(1) if m.group(1) else "*", s)

Python demo at tio.run or regex101 demo (using normal replace it will add extra *)


In Javascript using a callback and the same pattern with a capturing group:

let s = 'https://87230a61de33450b964afbc0814884ec@p2238591.ingest.sentry.io/15047....'

let res = s.replace(/(\bhttps:\/\/|\.ingest\.sentry\.io\/)|./g, (m0,m1) => m1?m1:'*');
console.log(res)
bobble bubble
  • 16,888
  • 3
  • 27
  • 46
0

As you were told, that is not how you "negate" regular expressions. In fact, there is no negation of regular expressions in general, you have to write one that matches what you want to actually match. For example, it sounds like you want something like this:

^(?:https:\/\/)?(.*?)(?:\.ingest\.sentry\.io)(.*)$

See it in action here: https://regex101.com/r/tUjZjT/1

Blindy
  • 65,249
  • 10
  • 91
  • 131