1

I have this problem where I need to queue a page link with TaskQueue:

        Queue queue = QueueFactory.getDefaultQueue();
        for (String href : hrefs){
            href = baseUrl + href;
            pageLinks = pageLinks + "\n" + href;
            queue.add(TaskOptions.Builder
                .withUrl("/crawler")
                .param("url", href));
          l("Added to queue url=["+href+"]");               
        }

The problem here is that, I think the URL that gets passed into the queue contains ?'s for Arabic characters. As it keeps on rescheduling.

The String pageLinks however is outputed in the browser through Spring MVC, and I can properly see the Arabic character being displayed. So I'm pretty the links are ok.

If I copy one of the links output on the browser, and paste it to the browser URL it works fine. So I'm pretty sure that the reason that the queue keeps on recheduling because it gets the wrong URL.

What could I be missing here? Do I need to convert the String href before passing it into the queue?

The crawl service looks like this:

@RequestMapping(method = RequestMethod.GET, value = "/crawl",
produces = "application/json; charset=iso-8859-6")
public @ResponseBody String crawl(HttpServletRequest req, HttpServletResponse res,
            @RequestParam(value="url", required = false) String url) {
        l("Processs url:" + url);
}

Also do I need to convert the @QueryParam String url here to Arabic or not?

quarks
  • 33,478
  • 73
  • 290
  • 513

1 Answers1

1

You must Url-encode the parameters. See this question: Java URL encoding of query string parameters

Community
  • 1
  • 1
Peter Knego
  • 79,991
  • 11
  • 123
  • 154
  • Ok i see, so when I URL encode String 'href' then does that mean that my crawl method will not be able to be called directly by a user from the browser, as it will do URL decoding to get the actual url? – quarks May 20 '13 at 09:50
  • Also, from what you are saying, it seems that I do not have to do any kind of transformation of String 'href' in the for-loop in my code above, before it gets passed into the Queue. Is that right? – quarks May 20 '13 at 09:55