-3

I want to fetch a very big html page, however when I tried to use jsoup for parsing the page it reported a lot of erros because the page is too large.

I also saved this page as a text file (resulting in a 225mb file), but the file is so large it exceeds the 2147483647 characters limit of String and StringBuilder.

How can I handle such a large string?

Gabriel
  • 1,922
  • 2
  • 19
  • 37
Davi Resio
  • 653
  • 2
  • 11
  • 21

2 Answers2

2

Download the file and save it locally. Then use Buffered File Readers to read the file line by line and process it. Reading the entire file into one string seems like a bad idea, given it's size, and you still can't analyze the data efficiently.

banncee
  • 959
  • 14
  • 30
1

The response is text/plain, not HTML, so don't use jsoup.

Do a simple HTTP GET, and parse the data as it is being downloaded, one line at a time, in order to minimize memory use. No need to store to disk first.

Andreas
  • 154,647
  • 11
  • 152
  • 247
  • i using spring boot project and i do it with resttemplate and it works, but how i can prevent exced limit string? my request code is it: @GetMapping("/cnpj") public ResponseEntity> listCnpj(){ restTemplate.getMessageConverters().add(new StringHttpMessageConverter()); Object obj = restTemplate.getForEntity("/F.K03200UF.D71214BA",String.class); return new ResponseEntity<>( obj,HttpStatus.OK); } – Davi Resio Oct 11 '18 at 19:32
  • here i return a get request only for see the result. the request with rest template is only it: restTemplate.getMessageConverters().add(new StringHttpMessageConverter()); Object obj = restTemplate.getForEntity("/F.K03200UF.D71214BA",String.class); – Davi Resio Oct 11 '18 at 19:33