-1

I need to parse a complex URL string to fetch specific values.

From the following URL string:

/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot&format=rss&url=http://any-feed-url-b.com?filter=rising&format=rss

I need to extract this result in array format:

['http://any-feed-url-a.com?filter=hot&format=rss', 'http://any-feed-url-b.com?filter=rising&format=rss']

I tried already with this one /url=([^&]+)/ but I can't capture all correctly all the query parameters. And I would like to omit the url=.

RegExr link

Thanks in advance.

Marco Martins
  • 117
  • 1
  • 8
  • 1
    *And I would like to omit the `url=`* - you already omit it since it is not part of the capturing group. Your data is inside Group 1. I think you need `var regex = /url=(.+?)(?=&url=|$)/g`, run `regex.exec(str)` [in a loop](http://stackoverflow.com/questions/6323417) and get Group 1. – Wiktor Stribiżew Jan 10 '20 at 10:17
  • You are using a capturing group and that is where the value is. Try `url=(.*?)(?=&url=|$))` https://regex101.com/r/snM42Q/1 – The fourth bird Jan 10 '20 at 10:17
  • 1
    Did you consider to use a query string parsing lib instead of doing this with a regexp? – sjahan Jan 10 '20 at 10:17

4 Answers4

0

have you tried to use split method ? instead of using regex.

const urlsArr = "/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot&format=rss&url=http://any-feed-url-b.com?filter=rising&format=rss".split("url=");
    urlsArr.shift(); // removing first item from array -> "/api/rss/feeds?"
console.log(urlsArr)

)

which is going to return ["/api/rss/feeds?", "http://any-feed-url-a.com?filter=hot&format=rss&", "http://any-feed-url-b.com?filter=rising&format=rss"] then i am dropping first item in array

if possible its better to use something else then regex CoddingHorror: regular-expressions-now-you-have-two-problems

Maielo
  • 692
  • 1
  • 6
  • 20
0

This regex works for me: url=([a-z:/.?=-]+&[a-z=]+)

also, you can test this: /http(s)?://([a-z-.?=&])+&/g

Example

const string = '/api/rss/feeds?url=http://any-feed-url.com?filter=hot&format=rss&url=http://any-feed-url.com?filter=latest&format=rss'

const string2 = '/api/rss/feeds?url=http://any-feed-url.com?filter=hot&format=rss&next=parm&url=http://any-feed-url.com?filter=latest&format=rss'

const regex = /url=([a-z:/.?=-]+&[a-z=]+)/g;
const regex2 = /http(s)?:\/\/([a-z-.?=&])+&/g;

console.log(string.match(regex))
console.log(string2.match(regex2))
Kordrad
  • 1,154
  • 7
  • 18
  • Hi Kordrad, thank you. It works but do you know how to omit ```url=``` in the results? Of course, I can do a string replace. But exist some way to ignore it with a similar regex? – Marco Martins Jan 10 '20 at 10:31
  • `[^url]+` probably doesn’t do what you intended. `((?!url=).)+`, maybe? – Ry- Jan 10 '20 at 10:32
  • try this: `/http(s)?://([a-z-.?=&])+&/g`, just start search your value from `http` text – Kordrad Jan 10 '20 at 10:39
0

You can matchAll the url's, then map the capture group 1 to an array.

str = '/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot&format=rss&url=http://any-feed-url-b.com?filter=rising&format=rss'

arr = [...str.matchAll(/url=(.*?)(?=&url=|$)/g)].map(x => x[1])

console.log(arr)

But matchAll isn't supported by older browsers.
But looping an exec to fill an array works also.

str = '/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot&format=rss&url=http://any-feed-url-b.com?filter=rising&format=rss'

re = /url=(.*?)(?=&url=|$)/g;
arr = [];
while (m = re.exec(str)) { 
arr.push(m[1]);
}

console.log(arr)
LukStorms
  • 28,916
  • 5
  • 31
  • 45
-1

If your input is better-formed in reality than shown in the question and you’re targeting a modern JavaScript environment, there’s URL/URLSearchParams:

const input = '/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot%26format=rss&url=http://any-feed-url-b.com?filter=rising%26format=rss';
const url = new URL(input, 'http://example.com/');

console.log(url.searchParams.getAll('url'));

Notice how & has to be escaped as %26 for it to make sense.

Without this input in a standard form, it’s not clear which rules of URLs are still on the table.

Ry-
  • 218,210
  • 55
  • 464
  • 476
  • You are not using the string in question, which is `/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot&format=rss&url=http://any-feed-url-b.com?filter=rising&format=rss`, so this answer does not help OP. – Wiktor Stribiżew Jan 10 '20 at 10:22
  • @WiktorStribiżew: You don’t know that it doesn’t help the OP. – Ry- Jan 10 '20 at 10:22
  • It cannot since you are not answering the current question. – Wiktor Stribiżew Jan 10 '20 at 10:22
  • @WiktorStribiżew: Not true, sorry. – Ry- Jan 10 '20 at 10:23
  • If this helps, there is no need to duplicate already existing solutions. The question will be a duplicate of [How can I get query string values in JavaScript?](https://stackoverflow.com/questions/901115/) – Wiktor Stribiżew Jan 10 '20 at 10:26