1

I've got base64 encoded, gzipped json in a Snowflake binary column that I'm trying to unpack.

With the query:

select
    base64_encode(my_binary_data) as my_base64_string
from my_table

I get a base64 encoded string, which I can cut n' paste into a bash script and pipe through:

echo $my_base64_string | base64 -d | gunzip

and get valid results. My question is, can I accomplish this through a query? I've tried some variations with decompress_string(), but I can't figure it out. Any suggestions?

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
NoobsterNoob
  • 125
  • 1
  • 2
  • 7
  • Doesn't `BASE64_DECODE_STRING` do what you want? – endo64 Oct 11 '21 at 20:52
  • Thanks @endo64. That produces an ```Invalid UTF8 detected while decoding``` error on strings that work in bash. Obviously I'm using bash as a sanity check here, but there could be a few things I'm overlooking. – NoobsterNoob Oct 11 '21 at 21:02
  • For what it's worth, I've also been able to unpack the json with a similar process in Python. So as far as `snowflake-sql` I can't tell if I don't know what I'm doing or it's not possible. – NoobsterNoob Oct 11 '21 at 21:44
  • The Snowflake uncompress function has the same available compression/decompression methods as the compress function. In the documentation for compress, it does not list gunzip: https://docs.snowflake.com/en/developer-guide/udf/javascript/udf-javascript-tabular-functions.html. You'd have to use something external or a Java UDF with a gunzip library in the JAR file. – Greg Pavlik Oct 11 '21 at 22:41
  • Thanks Greg, I'll look into your suggestion. – NoobsterNoob Oct 11 '21 at 22:44

2 Answers2

1

This works:

select udf_unzip('H4sIAH4TZWEAA8tIzcnJ5wIAIDA6NgYAAAA=');

-- hello

if you define a Java UDF like this:

create or replace function udf_unzip(b64 string)
returns string
language java
handler='MyClass.unzip'
as
$$
import java.util.zip.GZIPInputStream;
import java.util.Base64;
import java.io.*;

    class MyClass {
        public static String unzip(String b64) throws Exception {
            byte[] bytes = Base64.getDecoder().decode(b64);
            GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(bytes));
            InputStreamReader reader = new InputStreamReader(gis);
            BufferedReader in = new BufferedReader(reader);
            return in.readLine();
        }
    }
$$;

Note that this sample UDF only returns the first line from the encoded string. You'll need to iterate over the BufferedReader to get the whole string:

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
  • Note that the maximum size for a Java UDF output row is 16 MB according to [this](https://docs.snowflake.com/en/developer-guide/udf/java/udf-java-introduction.html) so if your entire uncompressed string size exceeds 16MB, you can't iterate over the BufferedReader to get the whole string. – Rafay Aug 11 '22 at 17:06
0

You can use a Python UDF as follows:

create or replace function python_udf_unzip(s string)
returns string
language python
runtime_version = '3.8'
handler = 'udf_unzip'
as
$$
def udf_unzip(x):
    import gzip
    import base64
     
    x = gzip.decompress(base64.b64decode(x)).decode('utf-8')
     
    return x
$$;

And you can call it as:

select python_udf_unzip('H4sIAH4TZWEAA8tIzcnJ5wIAIDA6NgYAAAA=')

And you get:

hello
George Pipis
  • 1,452
  • 16
  • 12