2

We're using "Google Cloud Storage Client Library" for app engine, with simply "GcsFileOptions.Builder.contentEncoding("gzip")" at file creation time, we got the following problem when reading the file:

com.google.appengine.tools.cloudstorage.NonRetriableException: java.lang.RuntimeException: com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1@1c07d21: Unexpected cause of ExecutionException
    at com.google.appengine.tools.cloudstorage.RetryHelper.doRetry(RetryHelper.java:87)
    at com.google.appengine.tools.cloudstorage.RetryHelper.runWithRetries(RetryHelper.java:129)
    at com.google.appengine.tools.cloudstorage.RetryHelper.runWithRetries(RetryHelper.java:123)
    at com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl.read(SimpleGcsInputChannelImpl.java:81)
...


Caused by: java.lang.RuntimeException: com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1@1c07d21: Unexpected cause of ExecutionException
    at com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1.call(SimpleGcsInputChannelImpl.java:101)
    at com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1.call(SimpleGcsInputChannelImpl.java:81)
    at com.google.appengine.tools.cloudstorage.RetryHelper.doRetry(RetryHelper.java:75)
    ... 56 more
Caused by: java.lang.IllegalStateException: com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsService$2@1d8c25d: got 46483 > wanted 19823
    at com.google.common.base.Preconditions.checkState(Preconditions.java:177)
    at com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsService$2.wrap(OauthRawGcsService.java:418)
    at com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsService$2.wrap(OauthRawGcsService.java:398)
    at com.google.appengine.api.utils.FutureWrapper.wrapAndCache(FutureWrapper.java:53)
    at com.google.appengine.api.utils.FutureWrapper.get(FutureWrapper.java:90)
    at com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1.call(SimpleGcsInputChannelImpl.java:86)
    ... 58 more

What else should be added to read files with "gzip" compression to be able to read the content in app engine? ( curl cloud storage URL from client side works fine for both compressed and uncompressed file )

This is the code that works for uncompressed object:

  byte[] blobContent = new byte[0];

        try
        {
            GcsFileMetadata metaData = gcsService.getMetadata(fileName);
            int fileSize = (int) metaData.getLength();
            final int chunkSize = BlobstoreService.MAX_BLOB_FETCH_SIZE;

            LOG.info("content encoding: " + metaData.getOptions().getContentEncoding()); // "gzip" here
            LOG.info("input size " + fileSize);  // the size is obviously the compressed size!

            for (long offset = 0; offset < fileSize;)
            {
                if (offset != 0)
                {
                    LOG.info("Handling extra size for " + filePath + " at " + offset); 
                }

                final int size = Math.min(chunkSize, fileSize);

                ByteBuffer result = ByteBuffer.allocate(size);
                GcsInputChannel readChannel = gcsService.openReadChannel(fileName, offset);
                try
                {
                    readChannel.read(result);   <<<< here the exception was thrown
                }
                finally
                {
                    ......

It is now compressed by:

GcsFilename filename = new GcsFilename(bucketName, filePath);
GcsFileOptions.Builder builder = new GcsFileOptions.Builder().mimeType(image_type);

    builder = builder.contentEncoding("gzip");

    GcsOutputChannel writeChannel = gcsService.createOrReplace(filename, builder.build());

        ByteArrayOutputStream byteStream = new ByteArrayOutputStream(blob_content.length);
        try
        {
            GZIPOutputStream zipStream = new GZIPOutputStream(byteStream);
            try
            {
                zipStream.write(blob_content);
            }
            finally
            {
                zipStream.close();
            }
        }
        finally
        {
            byteStream.close();
        }

        byte[] compressedData = byteStream.toByteArray();
        writeChannel.write(ByteBuffer.wrap(compressedData));

the blob_content is compressed from 46483 bytes to 19823 bytes.


I think it is the google code's bug

https://code.google.com/p/appengine-gcs-client/source/browse/trunk/java/src/main/java/com/google/appengine/tools/cloudstorage/oauth/OauthRawGcsService.java, L418:

 Preconditions.checkState(content.length <= want, "%s: got %s > wanted %s", this, content.length, want);

the HTTPResponse has decoded the blob, so the Precondition is wrong here.

Tom Fishman
  • 1,716
  • 6
  • 22
  • 36
  • Could you post some more information what you're actually doing and maybe some code that causes the problem? – markovuksanovic Jan 02 '14 at 04:42
  • In the compression part, which one do you think does the actual compression? Can you provide a bit more info/code how the file is being compressed. I think it might not be compressed. The setContentEncoding specifies compression method if object is compressed. – markovuksanovic Jan 02 '14 at 06:03
  • also can you specify file size of compressed and uncompressed file? Do the numbers in the exception have to do with those sizes? – markovuksanovic Jan 02 '14 at 06:18
  • What does LOG.info("input size " + fileSize); this line actually log? What is the actual value? – markovuksanovic Jan 02 '14 at 08:34
  • Can you please post a more complete snippet of your part where compression is done? Also can you confirm that there is no compression happening manually? I'm can't explain how the compressed file size got calculated. – markovuksanovic Jan 02 '14 at 10:12
  • You might want to take a look at [this answer](http://stackoverflow.com/a/13772805/624900) I wrote for another question – jterrace Jan 02 '14 at 16:51
  • @tom-fishman I've updated the answer. It looks like the content encoding might be incorrectly set. – markovuksanovic Jan 03 '14 at 00:45
  • I am having the very same issue. – EliuX May 05 '16 at 03:52

4 Answers4

0

If I good understand you have to set mineType:

GcsFileOptions options = new GcsFileOptions.Builder().mimeType("text/html")

Google Cloud Storage does not compress or decompress objects: https://developers.google.com/storage/docs/reference-headers?csw=1#contentencoding

I hope that's what you want to do .

jacek2v
  • 571
  • 4
  • 8
0

Looking at your code it seems like there is a mismatch between what is stored and what is read. The documentation specifies that compression is not done for you (https://developers.google.com/storage/docs/reference-headers?csw=1#contentencoding). You will need to do the actual compression manually.

Also if you look at the implementation of the class that throws the exception (https://code.google.com/p/appengine-gcs-client/source/browse/trunk/java/src/main/java/com/google/appengine/tools/cloudstorage/oauth/OauthRawGcsService.java?r=81&spec=svn134) you will notice that you get the original contents back but you're actually expecting compressed content. Check the method readObjectAsync in the above mentioned class.

It looks like the content persisted might not be gzipped or the content-length is not set properly. What you should do is verify length of the compressed stream just before writing it into the channel. You should also verify that the content length is set correctly when doing the http request. It would be useful to see the actual http request headers and make sure that content length header matches the actual content length in the http response.

Also it looks like contentEncoding could be set incorrectly. Try using:.contentEncoding("Content-Encoding: gzip") as used in this TCK test. Although still the best thing to do is inspect the HTTP request and response. You can use wireshark to do that easily.

Also you need to make sure that GCSOutputChannel is closed as that's when the file is finalized.

Hope this puts you on the right track. To gzip your contents you can use java GZIPInputStream.

markovuksanovic
  • 15,676
  • 13
  • 46
  • 57
  • The google's lib gave up before it reaches a point we can use GZIPInputStream – Tom Fishman Jan 02 '14 at 14:05
  • "Content-Encoding: gzip" corrupted the direct download "http://storage.googleapis.com/*" – Tom Fishman Jan 03 '14 at 06:44
  • @tim-fishman Can you explain? If you revert to "gzip" can you download the file? Also do you get compressed or uncompressed file? – markovuksanovic Jan 03 '14 at 06:47
  • @tom-fishman It would be great if you can trace the HTTP requests when storing and reading the content. You can use wireshark for that. Can you check the content-lenght header as well as the actual length of the content in the http body ( for both, read and save). – markovuksanovic Jan 03 '14 at 06:51
0

I'm seeing the same issue, easily reproducable by uploading a file with "gsutil cp -Z", then trying to open it with the following

ByteArrayOutputStream output = new ByteArrayOutputStream();
try (GcsInputChannel readChannel = svc.openReadChannel(filename, 0)) {
  try (InputStream input = Channels.newInputStream(readChannel))
  {
    IOUtils.copy(input, output);
  }
}

This causes an exception like this:

java.lang.IllegalStateException:
....oauth.OauthRawGcsService$2@1883798: got 64303 > wanted 4096
at ....Preconditions.checkState(Preconditions.java:199)
at ....oauth.OauthRawGcsService$2.wrap(OauthRawGcsService.java:519)
at ....oauth.OauthRawGcsService$2.wrap(OauthRawGcsService.java:499)

The only work around I've found is to read the entire file into memory using readChannel.read:

int fileSize = 64303;
ByteBuffer result = ByteBuffer.allocate(fileSize);
try (GcsInputChannel readChannel = gcs.openReadChannel(new GcsFilename("mybucket", "mygzippedfile.xml"), 0)) {
  readChannel.read(result);
}

Unfortunately, this only works if the size of the bytebuffer is greater or equal to the uncompressed size of the file, which is not possible to get via the api.

I've also posted my comment to an issue registered with google: https://code.google.com/p/googleappengine/issues/detail?id=10445

0

This is my function for reading compressed gzip files

public byte[] getUpdate(String fileName) throws IOException
{

    GcsFilename fileNameObj = new GcsFilename(defaultBucketName, fileName); 
    try (GcsInputChannel readChannel = gcsService.openReadChannel(fileNameObj, 0))
    {  
        maxSizeBuffer.clear();
        readChannel.read(maxSizeBuffer);
    } 
    byte[] result = maxSizeBuffer.array(); 
    return result;
}

The core is that you cannot use the size of the saved file cause Google Storage will give it to you with the original size, so it checks the sizes you expected and the real size and these are differents:

Preconditions.checkState(content.length <= want, "%s: got %s > wanted %s", this, content.length, want);

So i solved it allocating the biggest amount possible for these files using BlobstoreService.MAX_BLOB_FETCH_SIZE. Actually maxSizeBuffer is only allocated once outsize of the function

ByteBuffer maxSizeBuffer = ByteBuffer.allocate(BlobstoreService.MAX_BLOB_FETCH_SIZE);

And with maxSizeBuffer.clear(); all data is flushed again.

EliuX
  • 11,389
  • 6
  • 45
  • 40