0

I am using the following code to serialize and deserialize my object, that contains BufferedImage field.

private void writeObject(ObjectOutputStream out) throws IOException {
    out.defaultWriteObject();
    if (bufferedImageField != null) {
        out.writeBoolean(true);
        ImageIO.write(bufferedImageField, "png", out);
    }
    else
        out.writeBoolean(false);
}

private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException {
    in.defaultReadObject();
    if (in.readBoolean())
        bufferedImageField= ImageIO.read(in);
    else
        bufferedImageField = null;
}

But the problem is that image is being compressed and decompressed, so hashCode() of the object changes after deserialization. Also compression and decompression takes a lot of time.

How can I serialize the image without compression and deserialize it precisely?

AndrewR
  • 592
  • 2
  • 6
  • 20
  • With "precisely" you mean lossless? - Use a lossless format. For example TIF – Fildor Feb 03 '17 at 13:59
  • I mean `BufferedImage.hashCode()` should be the same. – AndrewR Feb 03 '17 at 14:01
  • 1
    `BufferedImage` inherits `hashCode` from `Object`. Therefore the value on the `hashCode` is in no way related to the content of the image. You're barking up the wrong tree here... – Boris the Spider Feb 03 '17 at 14:02
  • @BoristheSpider I know. What do you mean? – AndrewR Feb 03 '17 at 14:03
  • 1
    In any case, unless you're using `BufferedImage` as a key in a `HashMap` (and why would you be doing that?!) why on earth do you care? – Boris the Spider Feb 03 '17 at 14:04
  • @Fildor That's still a compression. I'd like to avoid compression to maximize speed too. – AndrewR Feb 03 '17 at 14:04
  • 1
    @AndrewR that means, HashCode won't ever be the same before and after serialization. Because it is a different object. – Fildor Feb 03 '17 at 14:05
  • "That's still a compression" ... I'm sure you are able to find a decent format without compression. – Fildor Feb 03 '17 at 14:07
  • @BoristheSpider I have a client/server application which transferring images between themselves. I want to avoid transfering objects I already transferred. – AndrewR Feb 03 '17 at 14:10
  • Then save a UID along with the content. – Fildor Feb 03 '17 at 14:11
  • 1
    @AndrewR then `hashCode` is **absolutely the worst** idea. `hashCode` **must** return the same value for the same object but **may** return the same value for different objects. `public int hashCode() { return 1; }` is a valid `hashCode` implementation. Use an MD5 hash of the image data. – Boris the Spider Feb 03 '17 at 14:12
  • @BoristheSpider every hash function may return the same value with 1/2^(hash length) probability. – AndrewR Feb 03 '17 at 14:17
  • @AndrewR true. But a `32` bit `int` is a very small space. A `128` bit MD5 is a tad bigger. There are `120,892,581,961,462,917,4706,176` possible MD5 hashes, and it's a **cryptographic** hash function specifically designed for this purpose. `Object.hashCode` is designed as an optimisation for hash based datastructures. So while what you say is arithmetically true, it's also [utter nonsense](http://stackoverflow.com/a/288519/2071828). – Boris the Spider Feb 03 '17 at 14:20
  • So? Point is: Collisions *can* occur. (@Andrew) And because you care about performance, it wouldn't be a great idea to hash the contents of the image. So just use a GUID instead. Not much of a deal to save one along. – Fildor Feb 03 '17 at 14:20
  • Java gives you https://docs.oracle.com/javase/8/docs/api/java/util/UUID.html ... – Fildor Feb 03 '17 at 14:27
  • @BoristheSpider 1/4,300,000,000 probability is small enough for my purposes. I won't be the end of the world if collision occurs, anyway it most probably won't happen ever. The question wasn't about hash function choise at all. It was about how to transfer the object. – AndrewR Feb 03 '17 at 14:27
  • 1
    "1/4,300,000,000 probability is small enough for my purposes." - OK But the fact you cannot actually use it with serialization makes the whole discussion senseless. – Fildor Feb 03 '17 at 14:28
  • @Fildor well said. – Boris the Spider Feb 03 '17 at 14:29
  • @BoristheSpider Thanks. Andrew: We are not trying to be condescending. Fact is: On Client you'll have hashCode A and after transfer to Server , there it will have hashCode B. Image format , compression etc won't change a thing about that. This leaves it useless to identify a duplicate. You need to see this. Otherwise your question is unanswerable except with "you can't". – Fildor Feb 03 '17 at 14:33
  • @Fildor so no way I can make objects byte-by-byte identical? – AndrewR Feb 03 '17 at 14:36
  • They are. Still they'll have a different hashCode because of the implementation of the default [Object.hashCode](https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#hashCode--) function. Which by the way could be different on different VMs. Key sentence in the doc: "This is typically **implemented by converting the internal address of the object into an integer**, but this implementation technique is not required by the Java™ programming language." – Fildor Feb 03 '17 at 14:38
  • I thought it's only related to object fields. Thank you for clarifying this. But the question about serialization remains relevant. – AndrewR Feb 03 '17 at 14:44
  • 1
    @Fildor funnily enough this JavaDoc is completely wrong for HotSpot - see [this excellent answer](http://stackoverflow.com/a/32454673/2071828). But obviously the problem is the same. There is no way to make the `hashCode` consistent with the default JVM implementation. – Boris the Spider Feb 03 '17 at 14:44
  • @AndrewR as [Fidor said](https://stackoverflow.com/questions/42025931/precise-bufferedimage-serialization-deserialization#comment71226308_42025931) create a UUID for each `Image` and store them alongside. – Boris the Spider Feb 03 '17 at 14:45
  • @BoristheSpider Every day something new on SO. Thanks for the Info. Wasn't aware of that. – Fildor Feb 03 '17 at 14:46
  • @BoristheSpider For example, user can upload one image twice. Your solution can't help against this problem. – AndrewR Feb 03 '17 at 14:48
  • 1
    @AndrewR then use the MD5. That's what almost everyone does. MD5 is _ridiculously fast_ on modern hardware. – Boris the Spider Feb 03 '17 at 14:49
  • Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackoverflow.com/rooms/134778/discussion-on-question-by-andrewr-precise-bufferedimage-serialization-deserializ). – Bhargav Rao Feb 03 '17 at 14:49
  • Maybe this is of use for you: http://stackoverflow.com/a/304350/982149 – Fildor Feb 03 '17 at 15:07
  • Thanks, that will boost up the performance greatly. – AndrewR Feb 03 '17 at 15:10

0 Answers0