-1

I'm new with Java and even more with jsoup, but I think I'm just stupid with files.

I'm wanting to create a file from parsing some HTML, but the .txt is only 80 kbs and I know should be more lines. Maybe is the size supported for Elements in jsoup or I'm just getting the buffer full.

This is the code i'm using:

public class RetrieveURLs
{
    public static void main(String[] args)
        throws FileNotFoundException
    {
        try {
            PrintWriter out = new PrintWriter( "filename.txt" );
            for (int i = 1; i < 80; i++) {
                Document doc = Jsoup.connect(
                    "http://elbuenfin.org/buscar/ofertas/pagina/"+ i +"/?entidad_id=&municipio_id=&categoria_id=&descuento_id=&promocion_id=&meses_id=&fulltext=&orderby=&order=").get();
                Element tienda = doc.select("div.product").first();

                out.println(tienda);
            }
        } catch (IOException ex) {
            Logger.getLogger(RetrieveURLs.class.getName()).log(Level.SEVERE, null, ex);
        }
    }
}
Sean Bright
  • 118,630
  • 17
  • 138
  • 146
  • Acquiring access to file and returning it is quite costly operation so we are trying to reduce its amount as much as possible. Because of that Java classes by default are using buffer which holds some portion of data, and only when that buffer is full it automatically writes its content to file. But at some point when we finish generating content for buffer, it may not be enough to fill it entirely. In that case we need to explicitly call `flush()` method on writer. Or in case you don't want to write anything else `close()` (which also calls `flush()`). – Pshemo Nov 08 '16 at 20:36
  • To ensure that such method will be called even if some exception will happen, we place it in `finally` block of `try` in which such exception may be thrown. If you don't want to explicitly handle closing your resource you can use try-with-resources which will be compiled automatically into try with finally section which will handle closing stream. To do so here, instead of `try { PrintWriter out = new PrintWriter( "filename.txt" ); ...}` use `try(PrintWriter out = new PrintWriter( "filename.txt" )){..}`. – Pshemo Nov 08 '16 at 20:39
  • I also don't think this is problem with maximal size document since according to Jsoup author in his answer posted here http://stackoverflow.com/a/29713698/1393766 it is 1MB which is greater than what you are getting now. – Pshemo Nov 08 '16 at 20:49
  • Thank you, I'm reading and trying those answers. – Elektro90 Nov 08 '16 at 20:52
  • This may also be helpful: https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html – Pshemo Nov 08 '16 at 20:53
  • I tried to close the the file, and then notice that the Element start to find nothing after 70 "valid products". I improved the range of the Element searching for the parent class (div.shop). And now the file.txt is about 1MB and 869 "valid products". Now I have trouble with extra lines but I can work with it. Thank you very much. – Elektro90 Nov 08 '16 at 21:04

1 Answers1

-1

It's not println(), that's all. This should work as expected:

public class RetrieveURLs
{
    public static void main(String[] args)
        throws FileNotFoundException
    {
        try {
            PrintWriter out = new PrintWriter( "filename.txt" );
            for (int i = 1; i < 80; i++) {
                Document doc = Jsoup.connect(
                    "http://elbuenfin.org/buscar/ofertas/pagina/"+ i +"/?entidad_id=&municipio_id=&categoria_id=&descuento_id=&promocion_id=&meses_id=&fulltext=&orderby=&order=").get();
                Element tienda = doc.select("div.product").first();

                out.write(tienda.toString());
            }
        } catch (IOException ex) {
            Logger.getLogger(RetrieveURLs.class.getName()).log(Level.SEVERE, null, ex);
        }
    }
}
Alexey Soshin
  • 16,718
  • 2
  • 31
  • 40