0

BufferedReader.readLine() removes EOL characters automatically, and I cannot simply do a readLine() and then tack a "\r" on the end of it. I tried

InputStream myFile = new FileInputStream("C:\\test.txt");
StringBuilder sb = new StringBuilder();

int i;

while((i = myFile.read()) != -1)
{
    char ch = (char) i;
    sb.append(ch);
}

System.out.println(sb);

but the "char ch = (char) i" loses byte data because ints are 4 bytes while chars are 2 bytes.

I repeat, I cannot do something like

sb.append(ch+"\r");

because some files that this generic code will read will include the CR and others will not.

From java.nio.*, Files.readAllBytes(Path path) seem like an option. But I am unfamiliar with it and cannot tell if it returns EOL characters or not based off the Javadoc

jseashell
  • 745
  • 9
  • 19
  • It sounds like you are just trying to read a text file into a string. If so see http://stackoverflow.com/questions/3402735/what-is-simplest-way-to-read-a-file-into-string – cyroxis May 06 '16 at 18:43
  • 2
    is there a reason to use `readLine()` if you're not actually interested in the content separated by lines? – zapl May 06 '16 at 18:43
  • 1
    Files.lines(Paths.get("C:\\test.txt")).forEach(System.out::println); or String content = new String(Files.readAllBytes(Paths.get("C:\\test.txt"))); – Hector May 06 '16 at 18:45
  • You will not loose character information converting from byte to char **in this situation** type is only not a character if the values is -1 to signify EOS. – cyroxis May 06 '16 at 18:47
  • The text file line separator was `\r` on classic MacOS, and there may be other systems that use that convention, but it is `\r\n` on Windows and `\n` on all Unixes, including OS X. – John Bollinger May 06 '16 at 18:57
  • @cyroxis I am indeed trying to read a file. But I want the "\r" to come with that read, as opposed to tacking it onto the end of my string, because I need to parse through data generically and look for CR as my stop character. So say my file contains a line like A,jacob,1,pass,50,60, then I want my readLine() to bring that with it. However, the readLine() method specifically says it strips all EOL characters – jseashell May 06 '16 at 19:28
  • @Hector I'll give your life a try and report back. Thank you – jseashell May 06 '16 at 19:28
  • You seem not to understand the difference between a byte and an encoded character. This will bite you. – Raedwald May 06 '16 at 19:36
  • See http://stackoverflow.com/questions/10611455/what-is-character-encoding-and-why-should-i-bother-with-it – Raedwald May 06 '16 at 19:38
  • @Hector your tip worked – jseashell May 06 '16 at 19:40

1 Answers1

3

You ideally don't touch the bytes. E.g.

public static String fromFile(File file, Charset charset) throws IOException {
    try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file), charset))) {
        StringWriter out = new StringWriter();
        char[] cbuf = new char[8192];
        int read;
        while ((read = reader.read(cbuf)) != -1) {
            out.write(cbuf, 0, read);
        }
        return out.toString();
    }
}

Converts everything straight into a single String. Converting byte to char is indeed dangerous and you should not try to do that yourself unless you know it's only ascii. Let the builtin charsets do that. It's tricky enough to use the right one already.

Files.readAllBytes() does return EOL characters as it works on bytes and does not try to interpret what those bytes mean.

public static String fromPath(Path path, Charset charset) throws IOException {
    byte[] bytes = Files.readAllBytes(path);
    return new String(bytes, 0, bytes.length, charset);
}

is the equivalent using the nio methods. Call with Paths.get("myfile.txt") instead of with new File("myfile.txt").

zapl
  • 63,179
  • 10
  • 123
  • 154
  • Doesn't .toString() strip off EOL characters? – jseashell May 06 '16 at 19:22
  • @j.seashell no - all the newlines are still present, only those `readline()` methods do that. – zapl May 06 '16 at 19:24
  • You might want to note that this will use the system default for the charset, and so might not work for non-ASCII characters. – jkinkead May 06 '16 at 20:14
  • @jkinkead both versions have the charset explicit, not sure what happens when you pass `null` though, might crash or use the system default. – zapl May 06 '16 at 23:56