0

I am pretty new to using Event hubs and am stuck of this problem.

We are streaming data of format xml to eventhub. Now, incase the size of xml is more than 1 MB, we will compress the xml's as eventhub cannot consume Larger than 1 MB data.

MAX_BYTES=1000000
## events is a List of XML Strings
for event in events:
    event_data_batch = eh_client.create_batch(max_size_in_bytes=MAX_BYTES)
    if sys.getsizeof(event) >= MAX_BYTES:
        event = gzip.compress(event)
    event_data_batch.add(EventData(event))
    eh_client.send_batch(event_data_batch)

Till this everything works, The problem is at receivers end. Now I want to consume the data from Eventhub. i.e Decompress the data and get the xml string that i sent earlier.

I am using below snippet, but it fails as message.body is of type Generator and it cannot be decompressed Also Tried,

  1. message variable which is in below snippet is object of class Event Data so it cannot be used directly to decompress.
  2. message.body_as_str(encoding='UTF-8') also is throwing error data inside event data is of type bytes and not string.
        message: EventData
        for message in messages:
            # message is of type EventDATA
            
            try:
                message_body = message.body_as_str().encode('utf-8')
                # If we receive compressed message, this will fail and we will got to except block
                
            except:
                self.log.info(f'Failed to extract Message Body, Checking if message is compressed')
                message_body = gzip.decompress(message.body) 

What can I do ? How can I consume the compressed data from eventhub for further processing?

  • You can refer to similar issues: [Azure Event Hubs data connection - Data format](https://learn.microsoft.com/en-us/azure/data-explorer/ingest-data-event-hub-overview#data-format), [Gzip automatic decompression fails on download with blobs larger than 32 MB](https://github.com/Azure/azure-sdk-for-python/issues/23714) and [EventHub GZip Compression](https://github.com/Azure/azure-sdk-for-java/issues/23997) – Ecstasy May 05 '22 at 09:47
  • Does this answer your question? [sending EventData to Azure Eventhub larger than 256k](https://stackoverflow.com/questions/38225516/sending-eventdata-to-azure-eventhub-larger-than-256k) – Ecstasy May 05 '22 at 09:50
  • If you're accessing the message body as a string, then the bytes that you wrote are already being decoded; encoding that string again isn't likely to get you what you want. We'd recommend reading the body as an array of bytes or encode it as a Base64 string when publishing the events. – Jesse Squire May 05 '22 at 13:18
  • This is exactly my question. How do i read it as an array of bytes ? Eventdata has only 2 ways of reading body as string and body as json, see this [Event Data Microsoft docs](https://learn.microsoft.com/en-us/python/api/azure-eventhub/azure.eventhub.common.eventdata?view=azure-python-previous#azure-eventhub-common-eventdata-body-as-json) – noobCoder50 May 05 '22 at 14:22
  • 1
    I believe you'd want to use the `body` property: https://learn.microsoft.com//python/api/azure-eventhub/azure.eventhub.common.eventdata?view=azure-python-previous#azure-eventhub-common-eventdata-body, which returns an instance of the built-in `bytes` type. – Jesse Squire May 05 '22 at 23:01

0 Answers0