1

I have an url and would like to parse and extract params from it. My implementation is based on the following stackoverflow post

However my url is more complex than the one used in the post above. It looks like this:

https://example.com/cdscontent/login?initialURI=https%3A%2F%2Fexample.com%2Fdashboard%2F%3Fportal%3Dmyportal%26LO%3D4%26contentid%3D10007.786471%26viewmode%3Dcontent%26variant%3D%2Fmyportal%2F

As you can see it has the param initialURI which is (encoded) url itself and the order of the params in it cannot be changed.

When I run org.apache.http.client.utils.URLEncodedUtils#parse it returns

[initialURI=https://example.com/dashboard/?portal=myportal, LO=4, contentid=10007.786471, viewmode=content, variant=/myportal/]

as you can see it parses every param except portal. It is still bound to https://example.com/dashboard/ In other words I am expecting this:

[initialURI=https://example.com/dashboard/, portal=myportal, LO=4, contentid=10007.786471, viewmode=content, variant=/myportal/]

Am I doing here something wrong or do you think that URLEncodedUtils#parse cannot handle this case?

Do you have any alternative to suggest?

Thx a lot!

Unit test to try

public class UrlParserTest {

  @Test
  public void testParseUrl() throws UnsupportedEncodingException, URISyntaxException {

    String url =
        "https://www.example.com/cdscontent/login?initialURI=https%3A%2F%2Fwww.example.com%2Fdashboard%2F%3Fportal%3Dmyportal%26LO%3D4%26contentid%3D10007.786471%26viewmode%3Dcontent%26variant%3D%2Fmyportal%2F";

    String decoded = URLDecoder.decode(url, "UTF-8");
    List<NameValuePair> params = URLEncodedUtils.parse(new URI(decoded), "UTF-8");
    System.out.println(params);
  }

}
amsalk
  • 577
  • 1
  • 5
  • 23
  • I can see `?portal=myportal` just fine, your code works – Mark Dec 19 '18 at 09:26
  • yeah but still as a part of `initialURI` unlike `LO`, `contentid` etc. – amsalk Dec 19 '18 at 09:28
  • I see now, your url `https://www.example.com/` has the query parameter `initialURI` which contains another url and other query parameters. What's the expected behaviour? That all query parameters belong to the url in `initialURI`? – Mark Dec 19 '18 at 09:34
  • Thats right and I've updated the question with the expected behavior. – amsalk Dec 19 '18 at 09:40
  • It is clear that you have to get `initialURI` query param value first by parsing the initial string, secondly parse the `initialURI` value. –  Dec 19 '18 at 10:17

1 Answers1

0

What are we working with

You have the following url (decoded):

https://www.example.com/cdscontent/login?initialURI=https://www.example.com/dashboard/?portal=myportal&LO=4&contentid=10007.786471&viewmode=content&variant=/myportal/

This url consists of the main url:

https://www.example.com/cdscontent/login

which has 1 query parameter initialURI:

https://www.example.com/dashboard/?portal=myportal&LO=4&contentid=10007.786471&viewmode=content&variant=/myportal/

This url has multiple query parameters (the ones you're looking for):

portal=myportal&LO=4&contentid=10007.786471&viewmode=content&variant=/myportal/

Solution

Step 1:

We first must get the url in the query parameter initialURI:

 List<NameValuePair> params = URLEncodedUtils.parse(new URI(url), Charset.forName("UTF-8"));

// Find first NameValuePair where the name equals initialURI
Optional<NameValuePair> initialURI = params.stream()
        .filter(e -> e.getName().equals("initialURI"))
        .findFirst();

System.out.println(initialURI);

This prints:

Optional[initialURI=https://www.example.com/dashboard/?portal=myportal&LO=4&contentid=10007.786471&viewmode=content&variant=/myportal/]

Step 2:

Now we can get the query parameters of this url and print them:

List<NameValuePair> initialParams = URLEncodedUtils
        .parse(new URI(initialURI.get().getValue()), Charset.forName("UTF-8"));

System.out.println(initialParams);

This results in:

[portal=myportal, LO=4, contentid=10007.786471, viewmode=content, variant=/myportal/]

Note

This is not entirely your expected behavior, you expected initialURI=https://example.com/dashboard/ to be in the list aswell. However you can see that this is not a query parameter, the entire url in initialURI (with it's query parameters) is the query parameter.

Mark
  • 5,089
  • 2
  • 20
  • 31