0

I wanted to use this code to parse and save webpages:

public static void main(String[] args) throws Exception {

String url1="https://www.google.com/search?q=some+words";

Document doc = Jsoup.connect(url1).userAgent("Mozilla/5.0")
.timeout(10000).get();

byte data[] = doc.getElementsByClass("g").toString().getBytes();

Path p = Paths.get("./myfile.html");

Files.write(p, data);

But when I open the file in browser some unreadable hieroglyphs appear, and only English letters were showed correctly. Later I opened the file using notepad and saved it in UTF-8 format and then everything have been displayed correctly! Though when I save entire page its always displayed correctly, it happens only when I try to save some specific elements of web page! I have tried this code:

OutputStream out=new BufferedOutputStream(Files.newOutputStream(p,CREATE, APPEND));
out.write(data, 0, data.length);

But it seems this method doesn't allow to set encoding of file also.

Is there a way to set encoding of a file when I save it?

Alex Rixon
  • 59
  • 1
  • 6

1 Answers1

1

Try this

Document document = Jsoup.parse(new URL(url1).openStream(), "UTF-8", url1);

References this similar question

Community
  • 1
  • 1
DarkHorse
  • 2,740
  • 19
  • 28