1

Maybe somebody can help me with this regex ?

.*\:\/\/(?:www.)?([^\/]+)(\/.+")

I need to get all paths from URL. I tried, but i can't match only path without quotation mark

https://regex101.com/r/J6nILD/6

twomvlad
  • 13
  • 3
  • This has been asked and answered multiple times (i.e. a duplicate). The one for you is probably [Getting parts of a URL (Regex)](https://stackoverflow.com/questions/27745/getting-parts-of-a-url-regex#27755). – Booboo Aug 31 '19 at 10:32
  • @RonaldAaronson Unfortunately, this answer will not suit me. I am using Jmeter. Jmeter does not accept this regex – twomvlad Aug 31 '19 at 10:35
  • 1
    So, when using tag [regex], the guidelines suggest adding an additional tag, namely the programming language you are using regex with. You might also mention what the problem with the "accepted" regex according to Jmeter so people who are not Jmeter experts know the limitations. – Booboo Aug 31 '19 at 10:38
  • @RonaldAaronson sure – twomvlad Aug 31 '19 at 10:40

3 Answers3

2

You can get the path using JSR223 Sampler with Groovy code.

  1. Declare/ get the URL variable

enter image description here

  1. Parse that URL to get protocol, host, port and path. Use JSR223 Sampler and paste the following code in Script area

    URL url1 = new URL(vars.get('url'));
    
    vars.put('protocol', url1.getProtocol());
    vars.put('host', url1.getHost());
    vars.put('port', url1.getPort() as String);
    vars.put('path', url1.getPath());
    vars.put('query', url1.getQuery());
    
  2. Use that variables anywhere in the script using ${}

enter image description here enter image description here

SAIR
  • 479
  • 3
  • 9
  • It was not clear to me as the question was posed that the OP has the URL "in hand" and did not need to search for it first. In this case, this is clearly the straightforward method to use and no regex is required. – Booboo Aug 31 '19 at 12:13
  • @SAIR This trick cannot be done if I have a dynamic url, which I take through another Regular Expression Extractor :( – twomvlad Aug 31 '19 at 12:21
  • @RonaldAaronson yea, you was right. I have been working with Jmeter recently and therefore I do not know its capabilities well – twomvlad Aug 31 '19 at 12:22
  • @twomvlad can you paste the dynamic URL? – SAIR Aug 31 '19 at 12:33
  • @RonaldAaronson, first thing is URL need to be captured by him and store it in a variable. This can be done N number of ways, depending upon from where he is taking URL. After that this solution will come into place. I have also shown by declaring URL variable. – SAIR Aug 31 '19 at 12:36
  • @SAIR I have this structure: I make a request to the page, collect link from there and after that I need to get PATH these link and if I try to insert a variable like this: `URL url1 = new URL(vars.get(${Random_link})); vars.put('protocol', url1.getProtocol()); vars.put('host', url1.getHost()); vars.put('path', url1.getPath());` But Jmeter does not find newly created variables - `unknown protocol: ${protocol}` – twomvlad Aug 31 '19 at 12:51
  • Your syntax is wrong: use this statement URL url1 = new URL(vars.get("Random_link")); [without ${}, because this does not work with vars.get function. other places it will work]. – SAIR Aug 31 '19 at 12:54
  • @SAIR I added PreProcessor to another request and it worked, thanks! But for some reason it takes not a complete path. The link looks like `https://nova.rambler.ru/search?query=%D0%9D%D0%BE%D1%80%D0%B0%20%D0%93%D0%B0%D0%BB%D1% 8C & amp; utm_source = search & amp; utm_medium = enser & amp; utm_campaign = self_promo & amp; utm_content = search`, but JSR found only `/ search`, can it be fixed somehow? – twomvlad Aug 31 '19 at 13:14
  • You need to understand the different parameter in a URL : protocol, host, port, path, query etc. The thing you are not getting is a "query". Why because we have not selected it. I have updated my answer to select query as well. You can accept it as an answer. – SAIR Aug 31 '19 at 13:59
  • Furthermore, if you want to select File or Ref parameter in URL. Use url1.getFile() or url1.getRef() function respectively. – SAIR Aug 31 '19 at 14:08
0

If you have to first scan for a URL:

I've attempted to provide a simple regex (overly simplified) that might work in your context, but you might have to modify it to provide some additional context. For example, x is a valid path and this regex will recognize it as such. But if you are trying to look for the path in a string such as <img src="x">, it will also recognize img as a valid url path. In that case, you would want perhaps:

/<img\s+src="((https?|ftp):\/\/[^\/]+)?(\/?[^?#\s"]*)/i

var regex = /\b((https?|ftp):\/\/[^\/]+)?(\/?[^?#\s]*)\b/i;
var s = 'http://example.com/a/b?x=1';
var result = regex.exec(s);
console.log(result[3]);

If the protocol and host potion of the URL are always present, then it becomes easier to distinguish URLs in just about any context by making the protocol and host not optional:

/\b((https?|ftp)://[^/]+)(/?[^?#\s]*)\b/i;

Booboo
  • 38,656
  • 3
  • 37
  • 60
0

You could go for something like:

(?:([^:\\/?#]+):)?(?:\\/\\/([^\\/?#]*))?([^?#]*)(?:\\?([^#]*))?(?:#(.*))?

Demo:

enter image description here

More information:

Dmitri T
  • 159,985
  • 5
  • 83
  • 133