We have a legacy dataflow job in Scala which basically reads from Bigquery and then dumps it into Postgres.
In Scala we read from bigquery, map it onto a case class and then dump it into Postgres, and it works perfectly for bigquery's Bytes
type as well.
The Schema we read from BQ into has an Array[Byte]
field in Scala, we use .setBytes
function to dump it into postgres table in the relevant Bytea
column.
Now we are migrating that job to Java, we are not using type case classes this time and the read from bigquery returns as com.google.api.services.bigquery.model.TableRow
object, for all the other field types it works as expected but I am having issues with the Bytes
type.
When I do
insetQuery.setBytes(3, row.get('bytes_type_column'))
it says that setBytes
column expects bytes, while row.get('bytes_type_column')
is an object. Now, if I do row.get('bytes_type_column').toString().getBytes()
, it works fine but it seems like the content of the original bytes columns is changed and I can not use it after reading from Postgres.
It seems to me that .toString()
messes up the bytes and changes into some Java string converting which to bytes messes up the original form.
The other approach I tried was
insetQuery.setBytes(3, (byte[])row.get('bytes_type_column'))
which also seems to have changed the content of the column.
Had the same issue when I tried this answer.
I have almost no experience with Java, can someone guide me here on how can I dump the BQ's byte column value I read as it is into Postgres without changing anything in it? Thanks.
If it's helpful for anyone, the BQ's byte column is actually a pickled python object, which I want to dump in Postgres, and then unpickle after reading in a Python application, if it's not being unpickled it means it wasn't dumped as it is.