0

I wrote a method that prepares SQL insert statements and stores them in a text file. At this point it gets the job done, but I have a catch block that is a total eyesore. If I could somehow close a BufferedReader without implicitly calling flush that would be perfect. Is that possible? If yes, how?

As preparation for populating a master table with data on kanji(Japanese characters) for the memorization application I am making, I am making a list of all the characters. The data source is KANJIDIC2, a UTF-8 encoded xml file with data on 13k+ characters. The original idea was to include all the characters in the source file, but for some reason 300-or-so characters throw a java.nio.charset.MalformedInputException when I try to write them to my output file. I decided to give up on those characters since they're not essential or anything, but I couldn't find a smooth way to close my BufferedReader after the exception above.

File outputFile = new File("C:\\Users\\tobbelobb\\Documents\\kanjilist.bsv");
try {
    BufferedWriter bw = Files.newBufferedWriter(outputFile.toPath(), StandardCharsets.UTF_8);
    for (Kanji nextKanji : kanjiList) {
        try {
            StringBuilder sb = new StringBuilder();
            // sb.append stuff from list of objects...
            bw.write(sb.toString());
            bw.newLine();
            bw.flush();
        } catch (MalformedInputException ex) {
            // Ungracefully swallow the exception.
            ex.printStackTrace();
            bw = Files.newBufferedWriter(outputFile.toPath(), StandardCharsets.UTF_8, StandardOpenOption.APPEND);
        }
    }
    bw.close();
} catch (Exception ex) {
    ex.printStackTrace();
}

I looked for a method to dispose of my BufferedReader object in the catch block, but the only one I could find is close(), which throws that same MalformedInputException again, and in the process writes an incomplete line of text to my file.

tobbelobb
  • 3
  • 2
  • Have you seen https://stackoverflow.com/questions/26268132/all-inclusive-charset-to-avoid-java-nio-charset-malformedinputexception-input ? – tevemadar Sep 16 '19 at 08:09
  • Yes. I tried different encodings. If I don't specify the encoding it doesn't throw exception, but 2/3 of the characters show up as "?" in the output file. Seems like it defaults to SJIS. UTF-16 gave slightly better coverage than UTF-8, but still threw exceptions so I arbitrarily decided to stick with UTF-8 and look for a solution elsewhere. – tobbelobb Sep 16 '19 at 08:16

1 Answers1

0

There is no such thing: https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/java/io/BufferedWriter.java#L258

public void close() throws IOException {
    synchronized (lock) {
        if (out == null) {
            return;
        }
        try {
            flushBuffer();
        } finally {
            out.close();
            out = null;
            cb = null;
        }
    }
}

so it always tries to flush, and that is probably what most users expect anyway.

What you can do is to generate your lines separately, using an in-memory BufferedWriter. If it dies, you skip the line, if it works, you write it. Also, it is 2019 now, so please start using try-with-resources.

try(BufferedOutputStream bos=new BufferedOutputStream(new FileOutputStream(outputFile))) {
    for (Kanji nextKanji : kanjiList) {
        try {
            StringBuilder sb = new StringBuilder();
            // sb.append stuff from list of objects...
            ByteArrayOutputStream baos=new ByteArrayOutputStream();
            BufferedWriter bw=new BufferedWriter(new OutputStreamWriter(baos));
            bw.write(sb.toString());
            bw.newLine();
            bw.close();
            baos.writeTo(bos);
        } catch (MalformedInputException ex) {}
    }
}

if it works at all (I am just writing from the top of my head), it can be prettier with a single ByteArrayOutputStream instance, and use its reset() in the loop.

tevemadar
  • 12,389
  • 3
  • 21
  • 49
  • Thanks for the answer, I tried it. It runs, but it doesn't skip the lines with unprintable characters, just writes them as "?". With this I am able to write the file as UTF-8 without throwing any exceptions, feels like a step in the right direction. I will try some modifications when time allows it. – tobbelobb Sep 17 '19 at 22:58