9

I need to programatically change the encoding of a set of *nix scripts to UTF-8 from Java. I won't write anything to them, so I'm trying to find what's the easiest|fastest way to do this. The files are not too many and are not that big. I could:

  • "Write" an empty string using an OutputStream with UTF-8 set as encoding
  • Since I'm already using FileUtils (from Apache Commons), I could read|write the contents of these files, passing UTF-8 as encoding

Not a big deal, but has anyone run into this case before? Are there any cons on either approach?

Dan
  • 1,763
  • 4
  • 26
  • 40
  • 2
    The *entire file must be read and re-written* except in the case of normal 7-bit clean ASCII files (and such) that do not require an initial BOM. The BOM will shift the stream as well as any encoding changes. –  Apr 10 '12 at 17:16
  • But Unixes default encoding is UTF-8 I believe.What is the encoding of your scripts. – Cratylus Apr 10 '12 at 17:18
  • @user384706 Perhaps it is more appropriate to say that non-BOM streams are taken as UTF-8 by many "text" applications... a "default encoding" is more appropriate to talk about in relationship to a particular language/library/API. –  Apr 10 '12 at 17:28
  • Scripts are coming with ISO-8859-1. @pst thanks fo clarifying option 1 is not an option :) – Dan Apr 10 '12 at 17:37
  • @pst stick an answer in so we can get this off the unanswered list – daveb Apr 10 '12 at 18:28
  • @daveb Nah, it should have a "small example" (using FileUtils) or another appropriately simple method as well. Your turn :-) –  Apr 10 '12 at 18:33

1 Answers1

14

As requested, and since you're using commons io, here is example code (error checking to the wind):

import java.io.File;
import java.io.IOException;
import org.apache.commons.io.FileUtils;

public class Main {
    public static void main(String[] args) throws IOException {
        String filename = args[0];
        File file = new File(filename);
        String content = FileUtils.readFileToString(file, "ISO8859_1");
        FileUtils.write(file, content, "UTF-8");
    }
}
daveb
  • 74,111
  • 6
  • 45
  • 51
  • Is `UTF-8` necessary?I think that Java's default encoding is `UTF-8` anyway – Cratylus Apr 10 '12 at 19:52
  • 4
    there are a couple of things to say here. First the default is unlikely tio be utf8, and second that because this code is all about encodings it is best to be explicit. http://stackoverflow.com/questions/1006276/what-is-the-default-encoding-of-jvm – daveb Apr 10 '12 at 21:18
  • 2
    WARNING: For some reason this cuts files longer than several KB, essentially deleting the file's contents beyond a certain point – Orlin Georgiev Mar 23 '17 at 21:22