0

I want to extract a sub-string from a string using javascript, then replace as set of characters by another character.

I know the prefix and postfix of the string. Between the prefix and postfix is the variable sub-string that I want to extract and replace set of characters on it.

For example, this string represents error URL. I need to extract the URL from it which in this example is: https%3A//revoked.badssl.com/ then replace %3A with :

The logic is that, the prefix is: about:neterror?e=nssFailure2&u= and the postfix is &c=UTF* as I do not care about the rest of the string after the & character.

I probably need to use Regex. However, I know how to use Regex to compare whether a string matches a specific pattern or not. But I wonder how to use Regex to extract a sub-string?

about:neterror?e=nssFailure2&u=https%3A//revoked.badssl.com/&c=UTF8&
f=regular d=An%20error%20occurred%20during%20a%20connection%20to%20revoked.badssl.com.
%0A%0APeer%E2%80%99s%20Certificate%20has%20been%20revoked.%0A%0AError%20code
%3A%20%3Ca%20id%3D%22errorCode%22%20title%3D%22SEC_ERROR_REVOKED_CERTIFICATE
%22%3ESEC_ERROR_REVOKED_CERTIFICATE%3C/a%3E%0A

EDIT: When I try this script:

var myString="about:neterror?e=nssFailure2&u=https%3A//abcde.somethig/&c=UTF-8&f=regular&d=An%20error%20occurred%20during%20a%20connection%20to%20abcde.somethig.\
%0A%0ACannot%20communicate%20securely%20with%20peer%3A%20no%20common%20encryption%20algorithm%28s%29.%0A%0AError%20code%3A%20%3Ca%20id%3D%22errorCode%22%20\
title%3D%22SEC_ERROR_REVOKED_CERTIFICATE%22%3ESEC_ERROR_REVOKED_CERTIFICATE%3C/a%3E%0A";
console.log(myString.replace(/^about:neterror\?e=nssFailure2&u=(https)%3A(.*)&c=UTF8.*$/, "$1:$2"));

NOTE: I used \ at the end of the line just to break the string in the editor. But it is not part of the original string. I get output identical to the input.

NOTE2: What about if I want to just use & as mark of strat for the postfix? I used console.log(myString.replace(/^about:neterror\?e=nssFailure2&u=(https)%3A(.*)&.*$/, "$1:$2")); but it prints: https://abcde.somethig/&c=UTF-8&f=regular and I want: https://abcde.somethig/

user6875880
  • 651
  • 1
  • 7
  • 17
  • This is a classic X-Y problem. The correct statement of your problem is "how do I extract a query parameter from a URL?". –  Jul 02 '17 at 16:44

3 Answers3

1

Simplest I can think of without using some complex regex and assuming the &c and &u are static, is this - first decoding the string as suggested by Jedi

var str = "about:neterror?e=nssFailure2&u=https%3A//revoked.badssl.com/&c=UTF8&f=regulard=An%20error%20occurred%20during%20a%20connection%20to%20revoked.badssl.com.%0A%0APeer%E2%80%99s%20Certificate%20has%20been%20revoked.%0A%0AError%20code%3A%20%3Ca%20id%3D%22errorCode%22%20title%3D%22SEC_ERROR_REVOKED_CERTIFICATE%22%3ESEC_ERROR_REVOKED_CERTIFICATE%3C/a%3E%0A"

var url = decodeURIComponent(str).split("&u=")[1].split("&c")[0];
console.log(url)

And here is why I close as duplicate:

function getParameterByName(name, url) {
    if (!url) url = window.location.href;
    name = name.replace(/[\[\]]/g, "\\$&");
    var regex = new RegExp("[?&]" + name + "(=([^&#]*)|&|#|$)"),
        results = regex.exec(url);
    if (!results) return null;
    if (!results[2]) return '';
    return decodeURIComponent(results[2].replace(/\+/g, " "));
}

var str = "about:neterror?e=nssFailure2&u=https%3A//revoked.badssl.com/&c=UTF8&f=regulard=An%20error%20occurred%20during%20a%20connection%20to%20revoked.badssl.com.%0A%0APeer%E2%80%99s%20Certificate%20has%20been%20revoked.%0A%0AError%20code%3A%20%3Ca%20id%3D%22errorCode%22%20title%3D%22SEC_ERROR_REVOKED_CERTIFICATE%22%3ESEC_ERROR_REVOKED_CERTIFICATE%3C/a%3E%0A"

var url = getParameterByName("u",str);
console.log(url);
mplungjan
  • 169,008
  • 28
  • 173
  • 236
1

What you are trying to do is to retrieve a query parameter from a URL, and then decode it. For this, you should use libraries build for the purpose, not regexp. For example, many regexp approaches will depend on the precise order of the query params, which is something you cannot really (or should not) rely on.

Here's an example using the URL API, which is stable and widely supported (other than IE11, and a bug in Firefox for non-http-type URLs such as about:neterror currently in the process of being fixed):

var str = "about:neterror?e=nssFailure2&u=https%3A//revoked.badssl.com/&c=UTF8&f=regulard=An%20error%20occurred%20during%20a%20connection%20to%20revoked.badssl.com.%0A%0APeer%E2%80%99s%20Certificate%20has%20been%20revoked.%0A%0AError%20code%3A%20%3Ca%20id%3D%22errorCode%22%20title%3D%22SEC_ERROR_REVOKED_CERTIFICATE%22%3ESEC_ERROR_REVOKED_CERTIFICATE%3C/a%3E%0A";

var url = new URL(str);

console.log(url.searchParams.get('u'));
  • _This is an experimental technology Because this technology's specification has not stabilized, check the compatibility table for usage in various browsers. Also note that the syntax and behavior of an experimental technology is subject to change in future versions of browsers as the specification changes._ – mplungjan Jul 02 '17 at 16:30
  • Yes, I know that boilerplate, but actually, the specification is perfectly stable, and browser support is just fine, with the exception of IE11. Since this technology is already out in the wild, the chances that it will change are zero to minimal. I agree with marking this question as a duplicate, but the dup target unfortunately doesn't even mention this approach, which IMHO should be preferred in the modern JS world. If you need IE11, there is more than one polyfill available. –  Jul 02 '17 at 16:40
  • @torazaburo I'm using Firefox and I care about this to work in Firefox. When I run your snippet, I get `null`. – user6875880 Jul 02 '17 at 16:50
  • Hmm. The problem seems to be that FF does not like the `about:neterror` part. I'm not sure if that's a bug in FF, or a more strict reading of the spec. FF also does not pick up query params in `mailto:foo@bar.com?subject=hello`. –  Jul 02 '17 at 17:32
  • FWIW I've raised a bug at https://bugzilla.mozilla.org/show_bug.cgi?id=1377772. –  Jul 03 '17 at 16:34
  • This bug has apparently been validated and is in the process of being fixed. –  Jul 04 '17 at 04:47
  • _The specification is perfectly stable, and browser support is just fine_ ;) – mplungjan Jul 04 '17 at 05:17
  • This bug in FF will be fixed in v56. –  Jul 06 '17 at 00:21
0

You can capture the sub string you need and then use back reference to extract and reformat it:

var s = "about:neterror?e=nssFailure2&u=https%3A//revoked.badssl.com/&c=UTF8&f=regular d=An%20error%20occurred%20during%20a%20connection%20to%20revoked.badssl.com.%0A%0APeer%E2%80%99s%20Certificate%20has%20been%20revoked.%0A%0AError%20code%3A%20%3Ca%20id%3D%22errorCode%22%20title%3D%22SEC_ERROR_REVOKED_CERTIFICATE%22%3ESEC_ERROR_REVOKED_CERTIFICATE%3C/a%3E%0A"

var url = s.replace(/^about:neterror\?e=nssFailure2&u=(https)%3A(.*)&c=UTF[\s\S]*$/, "$1:$2")

console.log(url)

Using decodeURIComponent:

var s = "about:neterror?e=nssFailure2&u=https%3A//revoked.badssl.com/&c=UTF8&f=regular d=An%20error%20occurred%20during%20a%20connection%20to%20revoked.badssl.com.%0A%0APeer%E2%80%99s%20Certificate%20has%20been%20revoked.%0A%0AError%20code%3A%20%3Ca%20id%3D%22errorCode%22%20title%3D%22SEC_ERROR_REVOKED_CERTIFICATE%22%3ESEC_ERROR_REVOKED_CERTIFICATE%3C/a%3E%0A"

var url = decodeURIComponent(s).replace(/^about:neterror\?e=nssFailure2&u=(.*)&c=UTF[\s\S]*$/, "$1")

console.log(url)
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • 1
    have you tested your solution? as I get the same exact string without replacement. – user6875880 Jul 02 '17 at 16:28
  • I tested. It works correctly. Notice you need to escape the question mark. after `neterror`, since it has a special meaning in regex. `\?` – Psidom Jul 02 '17 at 16:29
  • @Psidom - please give us a [mcve] using the `<>` snippet editor – mplungjan Jul 02 '17 at 16:31
  • Here is what I'm testing now and it gives me an output identical to the input string. See the edit please. – user6875880 Jul 02 '17 at 16:33
  • Even if you are going to use regexp to extract the query param value, you really should use `decodeURIComponent` to handle the decoding. –  Jul 02 '17 at 16:34
  • @user6875880 You need `UTF-8` instead of `UTF8` in your regex pattern. – Psidom Jul 02 '17 at 16:36
  • @Psidom see my edit please. Please note that in the second string in my edit I use `\` to break the string into multiple lines in the editor but it should not be considered part of the string (I hope I'm getting it right). – user6875880 Jul 02 '17 at 16:36
  • @user6875880 The two strings you gave are slightly different. One is `UTF8`, another is `UTF-8`. Maybe you could just use `UTF` if you are not certain. – Psidom Jul 02 '17 at 16:38
  • @torazaburo That is a good point. – Psidom Jul 02 '17 at 16:38
  • No, actually one would prefer a solution which has no dependency on whether the `c` param is UTF-8 or UTF8 or anything else. –  Jul 02 '17 at 16:44
  • @torazaburo agree. Any suggestions? – user6875880 Jul 02 '17 at 16:47
  • @user6875880 I guess the issue here is probably your string is a multiline string. You need the dotall modifier. You might check the updated answer modified from [here](https://stackoverflow.com/questions/1068280/javascript-regex-multiline-flag-doesnt-work). – Psidom Jul 02 '17 at 16:49
  • @Psidom you mean regarding the null output in your anwser? it still gives null even without multilines string. You can try it in FF if you have one to make sure of what I'm saying. – user6875880 Jul 02 '17 at 16:55