0

I need to find and replace a list of words given by the user. My application reads line by line in an HTML file and I want to verify if there is a word from the list and replace it with a blank space. This is what I have until now but I think I will have to modify my hole code in order to get what I want.

    private static void PrintFile(File source) throws IOException {
    String s;
    FileReader fr = new FileReader(source);
    @SuppressWarnings("resource")
    BufferedReader br = new BufferedReader(fr);

    @SuppressWarnings("resource")
    PrintWriter pw = new PrintWriter("Results.txt");
    while ((s=br.readLine())!=null) {
        pw.println(s.replaceAll(" ", "") //Words to be replaced.
                .replaceAll("<br>", "")
                .replaceAll("&amp;", "")
                .replaceAll("</p>", "")
                .replaceAll("</body>","")
                .replaceAll("</html>", "")
                .replaceAll("<remote object=\"#DEFAULT\">&gt;", ""));
    }
    System.out.println("Done!");
}

I accept any suggestions, the list idea may not be the best option.

Josué Almonasi
  • 143
  • 1
  • 14

3 Answers3

1

You can remove HTML markup as simple as this with Jsoup:

public static String html2text(String html) {
  return Jsoup.parse(html).text();
}

Also have a look at Cleaner and Whitelist to sanatize documents individually.

user1438038
  • 5,821
  • 6
  • 60
  • 94
0

Because String.replaceAll(String regex, String replacement) takes a regexp as its first parameter I would suggest to use String.replace(CharSequence, CharSequence replacement) instead to avoid undesired behavior.

Other than that I can't see a big problem in your code.

Martin Fernau
  • 787
  • 1
  • 6
  • 19
-1

If you don't mind including apache commons lang into your project, you can use StringUtils.replaceEach and be done with it.

Buhb
  • 7,088
  • 3
  • 24
  • 38