4

I am getting UnmappableCharacterException on the collect() method call (or on the toList() call):

    private static void handleTransaction(Path a_filePath, String a_sTrasactionName, String a_sTransactionFilePath) {

    // read file into stream, try-with-resources
    try (Stream<String> stream = Files.lines(Paths.get(a_filePath.toString()), Charset.defaultCharset())) {

        List<String> list =
            stream.filter(line -> (line.indexOf(a_sTrasactionName) > 0))
            .collect(Collectors.toList());

        list.forEach(line -> {

            System.out.println(line);

            try (BufferedWriter writer = Files.newBufferedWriter(Paths.get(_FILES_PATH + a_sTransactionFilePath),Charset.defaultCharset(), StandardOpenOption.APPEND)) {
                writer.write(line + "\n");
            } catch (IOException e) {
                e.printStackTrace();
            }
        });
    } catch (IOException e1) {

        e1.printStackTrace();
    }

It worked for me once, but never since then.

The files I read are csv files which were created on solaris. I run the jar on Windows 2012 server

Can you advise please?

Thank you.

dushkin
  • 1,939
  • 3
  • 37
  • 82
  • 2
    Don’t use `Charset.defaultCharset()` if you already know that the files stem from a different system, most likely having a different charset. I’d be very surprised if a software on Solaris creates files in one of the Window encodings. – Holger Jan 28 '16 at 11:23
  • @Holger So, what should I put there as charset? – dushkin Jan 28 '16 at 11:29
  • 1
    Possible duplicate of [What is character encoding and why should I bother with it](http://stackoverflow.com/questions/10611455/what-is-character-encoding-and-why-should-i-bother-with-it) – Raedwald Jan 28 '16 at 12:00
  • 1
    @Raedwald Can you please remive your alert? The post you presented explaines about encoding in general, but does not help me about the exception. Thank you – dushkin Jan 28 '16 at 12:45
  • 1
    Nope. Edit your question to indicate why it is not a duplicate of that canonical question. The heart of your problem is that you have not carefully selected your character encoding, relying on the defaults, which is a *very strong clue* that you do not understand the importance of character encoding – Raedwald Jan 28 '16 at 13:09
  • 1
    @Raedwald Again, I think I understand what is encoding quite well. And it seems that the file was created on Solaris with a different encoding from the one used on my windows machine. And this is what's probably causes the exception. The help that I need is how to overcome this exception? I also searched for Solaris encoding which seems to be Cp1252, but it didn't help me (or at leaset I was not using it properly). – dushkin Jan 28 '16 at 13:23
  • Is there a way to determine the encoding of a file, so that Scala can continue executing the writer's code? – Laserbeak43 Jan 31 '16 at 21:56

2 Answers2

7

The files I read are csv files which were created on solaris. I run the jar on Windows 2012 server

Well that's probably the problem then. You're using the platform-default encoding for both reading and writing the file. If the files were created on Solaris, that may very well have a different platform-default encoding to your Windows box.

If you know the encoding of the file you're reading, specify that.

If you get to control the encoding of the file you're reading and writing, I would strongly recommend using UTF-8 unless you have a really good reason not to.

Only use Charset.default() if you're reading a file which you know uses the platform-default encoding, or if you're writing a file which you definitely want to use the platform-default encoding - and try to avoid the latter.

(Basically, a world where everything is encoded in UTF-8 is a simpler world...)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Is there a way I can enforce the file to be utf-8? – dushkin Jan 28 '16 at 12:05
  • 2
    @dushkin: It's unclear what you mean. A file is just a sequence of bytes, basically. Anything which is able to create a file can create whatever sequence it wants. We have no idea what's creating the input file to start with... you should be looking at that to start with. – Jon Skeet Jan 28 '16 at 13:09
  • Amen on the UTF-8. Use StandardCharsets.UTF_8 when you read the data. I think we should sunset the other character sets, and deprecate them. – ggb667 Nov 02 '22 at 14:41
3

Honestly, I'm not even sure if this is an answer, but I'd like to help. I'm having the same problem and used:

val source = io.Source.fromFile("C:/mon_usatotaldat.csv").codec.decodingReplaceWith("UTF-8")

And I got the output:

source: scala.io.Codec = windows-1252

Laserbeak43
  • 595
  • 2
  • 8
  • 21