I have this problem where I need to queue a page link with TaskQueue
:
Queue queue = QueueFactory.getDefaultQueue();
for (String href : hrefs){
href = baseUrl + href;
pageLinks = pageLinks + "\n" + href;
queue.add(TaskOptions.Builder
.withUrl("/crawler")
.param("url", href));
l("Added to queue url=["+href+"]");
}
The problem here is that, I think the URL that gets passed into the queue contains ?
's for Arabic characters. As it keeps on rescheduling.
The String pageLinks
however is outputed in the browser through Spring MVC, and I can properly see the Arabic character being displayed. So I'm pretty the links are ok.
If I copy one of the links output on the browser, and paste it to the browser URL it works fine. So I'm pretty sure that the reason that the queue keeps on recheduling because it gets the wrong URL.
What could I be missing here? Do I need to convert the String href
before passing it into the queue?
The crawl service looks like this:
@RequestMapping(method = RequestMethod.GET, value = "/crawl",
produces = "application/json; charset=iso-8859-6")
public @ResponseBody String crawl(HttpServletRequest req, HttpServletResponse res,
@RequestParam(value="url", required = false) String url) {
l("Processs url:" + url);
}
Also do I need to convert the @QueryParam
String url
here to Arabic or not?