0

i'm working on a simple implementation of Huffman coding and it works fine for any files using some form of text encoding but when i try to read in any other format (e.g. .mp4 .png .exe) it still works but becomes extremely slow (minutes instead of less than a second for the same size of file).

my question is is there another method i should be using to read these files so that the read speed depends on the size of the file not its format and if so what is it? thanks.

this is my IO class it uses a fileReader wrapped in a bufferedReader to read files based on a path entered in the console.

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

public class IO {
    public String readFile(String path, boolean includeNewLine) {
        String returnString = "";
        try {
            FileReader fileReader = new FileReader(path);

            BufferedReader bufferedReader = new BufferedReader(fileReader);

            String line;
            int nLines = 0;
            while((line = bufferedReader.readLine()) != null) {
                if(nLines > 0 && includeNewLine) {
                    returnString += "\n";
                }
                returnString += line;
                nLines++;
            }   

            bufferedReader.close();         
        } catch(FileNotFoundException e) {
            System.out.println("Unable to open file '" + path + "'");                
        } catch(IOException e) {
            System.out.println("Error reading file '" + path + "'");                  
        }

        return returnString;
    }
}

3 Answers3

0

With returnString you are creating new instance of String by appending the new line to previous line. Instead i would suggest you use StringBuilder as follows:

StringBuilder fileContent = new StringBuilder();
//do your stuff
fileContent.append(line);

In this way, you keep on reusing the same builder object. Also if you are reading binary content then better use class from InputStream hierarchy.

We do have Files class from nio package which you could use to get lines as below instead:

try (Stream<String> stream = Files.lines( Paths.get(filePath), StandardCharsets.UTF_8)) {
    stream.forEach(s -> fileContent.append(s).append("\n"));
}

Another way, would be to use already tested code provided by Apache commons IO api FileUtils.readFileToString

SMA
  • 36,381
  • 8
  • 49
  • 73
  • Or, just `new String(Files.readAllBytes(Paths.get(filePath)), StandardCharsets.UTF_8)`. No need to resort to a third party library. – VGR Apr 15 '18 at 01:37
  • Remember OP needs to append few strings in between too like new line, so above would not work for him. – SMA Apr 15 '18 at 06:31
0

Maybe this will help: FileInputStream vs FileReader

And, of course, change your method to use StringBuilder (but that's another issue).

Roni Koren Kurtberg
  • 495
  • 1
  • 8
  • 18
0

As long as you are trying to interpret the file as a String you'll be running into problems with efficiency. Any binary format may produce a huge string, even exceeding the 64K maximum a string can hold as there may never be a byte you'll interpret as a end of line character ('\n').

You should interpret your file as a sequence of bytes. Use a memory mapped ByteBuffer for maximum efficiency.

M. le Rutte
  • 3,525
  • 3
  • 18
  • 31