4

In order to apply real time speech analytics using big data technologies, I'm trying to use at beginning KAFKA. So at first I convert .wav file to bytes using WAVIO API and then send messages containing [data(type of nparray), rate (integer) and sampwidth (integer)] to kafka after these messages will be consumed by a Consumer who will convert them to .wav file again.

The problem is how can I send and receive these [data, rate, sampwidth] to and from kafka in one message (each message represent .wav file)?

For the Producer:

    producer = KafkaProducer(bootstrap_servers='localhost:9092')
    x = wav2bytes("bush_read") # return tuple containing(data, rate, sampwidth)
    #here I'm sending 3 messages
    producer.send("TestTopic", key=b'data', value=b'%s' % (x[0])) # data -> nparray
    producer.send("TestTopic", key=b'rate', value=b'%d' % (x[1]))   # rate -> int
    producer.send("TestTopic", key=b'sampwidth', value=b'%d' % (x[2]))  #sampwidth -> int
    send("TestTopic","bush_read")

For the consumer:

    for message in consumer:
        msg = message     # I want somthing like this
        file = bytes2wav("name", msg.data, msg.rate, msg.sampwidth )
Community
  • 1
  • 1
John Smith
  • 199
  • 1
  • 1
  • 10

2 Answers2

0

You can send it as json (or any other serializion) if you want, create a json like

{'data' : data, 'rate': rate, 'sampwidth': sampwidth}

and you can deserializion it in the consumer

Reznik
  • 2,663
  • 1
  • 11
  • 31
0

Just another thought!!

If .wav file quite large it could put load on broker which could slow down cluster. It can be avoided by publishing reference messages beside full large file.

  1. Store large file in some where external storage
  2. Publish meta reference of file to topic pointing to stored file location
  3. Consumer may consume reference and locate file into external storage.
Nitin
  • 3,533
  • 2
  • 26
  • 36