0

My assignment is to create a program that does compression using the Huffman algorithm. My program must be able to compress any type of file. Hence why i'm not using the Reader that works with characters. Im not understanding how to be able to make some kind of frequency table when encoding a binary file?

EDIT!! Problem solved.

public static void main(String args[]){

    try{
        FileInputStream in = new FileInputStream("./src/hello.jpg");

        int currentByte;
        while((currentByte = in.read())!=-1){ //in.read() 

            //read all byte streams in file and create a frequency 
            //table
        }


    }catch (IOException e){
        e.printStackTrace();
    }
}
Fanny
  • 151
  • 1
  • 2
  • 13
  • 1
    Just cast to `char` – Jan Gassen Feb 16 '18 at 20:36
  • 4
    An image file contains binary data; not characters. If you really want to examine characters, use a hex dump utility. – Elliott Frisch Feb 16 '18 at 20:39
  • 1
    This is not possible in all cases, as in UTF-8 a "character" may involve more than one byte in the input stream. I think this is an [XY Problem](http://xyproblem.info) and it would help a lot if you explained your ultimate reason for needing to do this. – Jim Garrison Feb 16 '18 at 20:40
  • @JimGarrison yes I suppose it is. my real problem is that i'm supposed to make a Huffman compress program that can compress any type of file. But I don't know how to make a frequency table when dealing with binary data like that. – Fanny Feb 16 '18 at 20:41
  • Since it all boils down to converting an int to a char, [this answer](https://stackoverflow.com/questions/17984975/convert-int-to-char-in-java) will do a well enough job. That said...why are you looking at characters in an image file? The code snippet makes sense, because you input a text file, but your problem statement does not because an image is binary. – Brandon McKenzie Feb 16 '18 at 20:44
  • @BrandonMcKenzie Im just trying to figure out how I could achieve a huffman encoding for content in a binary file. How to make a frequency table from content in a binary file. – Fanny Feb 16 '18 at 20:51
  • 4
    In that case you do not care anything about the "characters", you can treat the binary file as a stream of bytes without ever getting up to the level of characters. Please [edit] your question and put all the new information about your objectives into the original question. As to frequencies, that's trivial, there are only 256 possible values. – Jim Garrison Feb 16 '18 at 20:56

2 Answers2

1

I'm not sure what you mean by "reading from an image and look at the characters" but talking about text files (as you're reading one in in your code example) this is most of the time working by casting the read byte to char by doing a

char charVal = (char) currentByte;

It's mostly working because most data is ASCII and most charsets contain ASCII. It gets more complicated with non-ASCII characters because a simple cast is equivalent with using charset ISO-8859-1. This will still most of the time produce correct results, because e.g. Window's cp1252 (on german systems) only differ with ISO-8859-1 at the Euro-sign.

Things start to run havoc with charsets like UTF-8 where non-ASCII characters are encoded with multiple bytes, so you will see things like ä instead of an ä. Same for files being encoded with Unicode where every second byte is most likely a binary zero.

Lothar
  • 5,323
  • 1
  • 11
  • 27
-2

You could use Files.readAllBytes and then iterate over this array.

 Path path = Paths.get("hello.txt");
 try {
   byte[] array = Files.readAllBytes(path);


} catch (IOException ) {
}
Christian
  • 3,503
  • 1
  • 26
  • 47