4

I am using jsoup to parse some polish sites, but I have problem with special characters like "ą", "ś" in URL(!), for example example.com/kąt is readed like example.com/k

every query without this special characters works perfectly

I have tried Document doc = Jsoup.parse(new URL(url).openStream(), "ISO-8859-1", url) but it does not work.

any other tips?

Stephan
  • 41,764
  • 65
  • 238
  • 329
koowalsky
  • 311
  • 1
  • 3
  • 14

1 Answers1

4

You want to encode your URL before passing it to Jsoup.

SAMPLE CODE

String url = "http://sjp.pl/maść";       
System.out.println("BEFORE " + url);

String encodedURL = URI.create(url).toASCIIString();
System.out.println("AFTER " + encodedURL);

System.out.println("Title: " + Jsoup.connect(encodedURL).get().title());

OUTPUT

 BEFORE http://sjp.pl/maść
 AFTER http://sjp.pl/ma%C5%9B%C4%87
 Title: maść - Słownik SJP

French locale
Jsoup 1.8.3

Stephan
  • 41,764
  • 65
  • 238
  • 329