cannot preserve newlines in text read from URL

Question

I am reading text from URL using Jsoup. Following link has some tips to preserve new lines when converting the body to text How do I preserve line breaks when using jsoup to convert html to plain text?

I use following lines to convert the tags

  String prettyPrintedBodyFragment = Jsoup.clean(body, "", Whitelist
            .none().addTags("br", "p",  "h1"), new OutputSettings()
            .prettyPrint(true));
  System.out.println(prettyPrintedBodyFragment);

I still get the body/content in single line. Any clues pl?

EDIT: Here is the complete source code and I see output in only 1 line

 public static void main(String[] args) throws Exception {

        Connection conn = Jsoup.connect("http://finance.yahoo.com/");
        Document doc  = conn.get();

         String body = doc.body().text();

        String prettyPrintedBodyFragment = Jsoup.clean(body, "", Whitelist
                .none().addTags("br", "p",  "h1"), new OutputSettings()
                .prettyPrint(true));

        System.out.println(prettyPrintedBodyFragment);



    }

Edited the original post with the source code for reading from finance.yahoo.com — kashili kashili, Feb 10 '14 at 16:07

score 1 · Accepted Answer · answered Feb 10 '14 at 16:31

1

Change:

String body = doc.body().text();

To:

String body = doc.body().html();

Since you are already dumping the tags, your Whitelist has no way to include them while formatting your text.

answered Feb 10 '14 at 16:31

StoopidDonut

8,547
2
33
51

cannot preserve newlines in text read from URL

1 Answers1