0

Long story short, I need to convert a bytea from PostgreSQL to an int[][] using Java 6 (or SQL.)

If you want context, feel free to read below, but that's really all there is to it.

Context

Somebody who used to work in the team I am in wrote a bean like this:

public class FourPointBean {
  @Column(name = "points", length = "65535")
  private int[][] xy;
  ... other irrelevant stuff
}

Where the int[][] was really just four points (no idea why he chose an int[][]...)

(xy[0][0], xy[0][1]) := (x0, y0)
(xy[1][0], xy[1][1]) := (x1, y1)
(xy[2][0], xy[2][1]) := (x2, y2)
(xy[3][0], xy[3][1]) := (x3, y3)

And apparently for years Hibernate had been throwing an exception that everybody ignored because it was "only a warning" --

WARNING: duplicate class definition bug occured? Please report this :
com/company/blah/SomeBean$JaxbAccessorM_getXy_setXy_Array_Of_[I
java.lang.ClassFormatError: Illegal class name
"com/company/blah/SomeBean$JaxbAccessorM_getXy_setXy_Array_Of_[I" in class file
com/company/blah/SomeBean$JaxbAccessorM_getXy_setXy_Array_Of_[I
      at java.lang.ClassLoader.defineClass1(Native Method)
...

The result was Hibernate inserting the int array as a byta in the table. Eventually this bean definition actually caused some issues, so I changed it --

public class FourPointBean {
  @Type(type = "com.company.blah.PointArrayType")
  @Column(name = "points", length = 65535, columnDefinition = "int[][]")
  private int[][] xy;
  ... other irrelevant stuff
}

And then I changed the DTO used by the UI to:

public class FourPointDTO {

  private List<Point> points = Lists.newArrayListWithCapacity(4);
  ...
}

Which is all fine and dandy, but now I have to write something to migrate all of the messy existing byte arrays in the table in question to a PostgreSQL int[]... using Java 6.

Everything I saw online was involving 1D-arrays that did not translate cleanly... right now I can't even figure out how to turn int[][] into a byte array for testing purposes.

As always, help is much appreciated...


Edit

Here are a couple translations --


(00525bc5f039-2d70-40f0-922c-0ef7060816be)

int[][]

[0][0] := 538
[0][1] := 760
[1][0] := 676
[1][1] := 760
[2][0] := 676
[2][1] := 890
[3][0] := 538
[3][1] := 890

Bytes

 \xaced0005757200035b5b4917f7e44f198f893c020000787000000004757200025b494dba602676eab2a50200007870000000020000021a000002f87571007e000200000002000002a4000002f87571007e000200000002000002a40000037a7571007e0002000000020000021a0000037a

(005276c1cb74-2476-43bf-856e-43912e969000)

int[][]

[0][0] := 544
[0][1] := 638
[1][0] := 657
[1][1] := 638
[2][0] := 657
[2][1] := 743
[3][0] := 544
[3][1] := 743

Bytes

 \xaced0005757200035b5b4917f7e44f198f893c020000787000000004757200025b494dba602676eab2a5020000787000000002000002200000027e7571007e000200000002000002910000027e7571007e00020000000200000291000002e77571007e00020000000200000220000002e7
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
Ryan
  • 729
  • 1
  • 10
  • 25
  • Please show the actual bytes of a couple of the values of interest. It isn't clear from your description whether the `bytea` is a serialized Java class, a 2-stride concatenation of an integer type like `int32`, etc, along with the corresponding integer array your old version, un-patched application converts it to. Also, can't you (please) use the PostgreSQL `point` type and a Hibernate type handler for it? e.g. `ARRAY[ point(1,2), point(3,4) ]`. Multidimensional arrays are quirky to work with. – Craig Ringer Jan 23 '14 at 01:54
  • It's really *not* as simple as your first par makes out. How do you convert `\x726f6262795c736f636b73` to `int[][]`? That's the shift-JIS encoded text "robby\socks". In other words: It's necessary to know what the bytes actually are, what they mean, what the structure is, in order to attempt any meaningful conversion. For that **you have to actually show the data** - an example of the byte series, and the integers you expect to correspond to it. (Please **edit your question** then **comment here when done**). – Craig Ringer Jan 23 '14 at 02:00
  • Hey Craig, I added examples. And yeah I can turn it into point[], but obviously that can only happen after I am able to turn the bytea into something meaningful. – Ryan Jan 23 '14 at 02:06
  • Give me a minute and I can try to give something more meaningful (id -> int[][] values and then the id -> bytes.) – Ryan Jan 23 '14 at 02:08
  • Yeah... your app used to be able to translate them, so it'd be preferable to have a sample of both sides of the translation from back when it used to work. If that's possible. If not, you've got no way to know which of the *many* different translations to `int[][]` is the correct one..., – Craig Ringer Jan 23 '14 at 02:10
  • Added a couple translations. Going to run the python to get the raw bytes. – Ryan Jan 23 '14 at 02:34

1 Answers1

1

Short version: Looks like Hibernate was saving a serialized Java object of type int[][]. Gross. Deserialize it in Java using a ByteArrayInputStream wrapped in an ObjectInputStream.


To get the raw bytes, a Python snippet like:

points_hex = open("/path/to/points.txt").readlines()
points = [ p[2:-1].strip().decode("hex") for p in points_hex ]

works. I suspected there was a common prefix, so I checked. Courtesty of this nice easy longest-common-prefix algo:

from itertools import takewhile,izip

def allsame(x):
    return len(set(x)) == 1

r = [i[0] for i in takewhile(allsame ,izip(*points))]
''.join(r).encode("hex")

it's confirmed to be:

\xaced0005757200035b5b4917f7e44f198f893c020000787000000004757200025b494dba602676eab2a50200007870000000020000

The presence of a prefix like this strongly suggests that we're dealing with a serialized Java object, not a bytea representation of a series of points in an array. That's easily confiremed with file, after writing a point's raw binary to a file:

open("/tmp/point","w").write(points[0])

then in the shell:

$ file /tmp/point 
/tmp/point: Java serialization data, version 5

You should decode these points by de-serializing them into an int[][] using Java. It'll be possible for a simple object like int[][] in SQL, but there's no point doing it manually when you can just ask Java to handle it for you.

Update:

I was feeling nice, so here's Java code to decode it:

import java.io.*;
import java.util.Arrays;

public class Deserialize {

    // Credit: https://stackoverflow.com/a/140861/398670
    public static byte[] hexStringToByteArray(String s) {
        int len = s.length();
        byte[] data = new byte[len / 2];
        for (int i = 0; i < len; i += 2) {
            data[i / 2] = (byte) ((Character.digit(s.charAt(i), 16) << 4)
                                 + Character.digit(s.charAt(i+1), 16));
        }
        return data;
    }


    public static void main( String[] args ) throws Exception {
        if (args.length != 1) {
            System.err.println("Usage: java Deserialize aced....hexstring...");
            System.exit(1);
        }

        String hex = args[0];
        if (hex.startsWith("\\x")) {
            hex = hex.substring(2);
        }

        ByteArrayInputStream bis = new ByteArrayInputStream(hexStringToByteArray(hex));
        ObjectInput in = new ObjectInputStream(bis);
        Object obj_read = in.readObject();

        if (obj_read instanceof int[][]) {
            int[][] obj_arr = (int[][]) obj_read;
            System.err.println("Array contents are: " + Arrays.deepToString(obj_arr) );
        }
    }

}

Usage:

$ javac Deserialize.java 
$ java Deserialize '\xaced0005757200035b5b4917f7e44f198f893c020000787000000004757200025b494dba602676eab2a50200007870000000020000021a000002f87571007e000200000002000002a4000002f87571007e000200000002000002a40000037a7571007e0002000000020000021a0000037a'
Array contents are: [[538, 760], [676, 760], [676, 890], [538, 890]]

Of course, in reality you'll be using PgJDBC, which hands you a byte[] directly, so you don't have to do the hex decoding I do above.

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • @Ryan answer updated - includes demo showing deserialization. – Craig Ringer Jan 23 '14 at 03:09
  • Haha thanks, you spent more time than you needed, I was going to post my solution but you beat me to it. So damn easy; didn't realize what was being returned was serializable. Thanks a ton man. – Ryan Jan 23 '14 at 03:19
  • ^ Not that you "needed" to spend any time :) Thx again. – Ryan Jan 23 '14 at 03:19