0

I have created a dummy model which returns the array input to it and deployed it on google-cloud ML-engine so I can check how it is decoding the audio sent in the request. I have been unable to send audio stored in a float array from an android app to the model in a way that it is correctly decoded. Though I have no problems when sending a request from Python.

I make the request as follows: Audio is recorded into

short[] inputBuffer = new short[RECORDING_LENGTH];

Converted to a float array

float[] floatInputBuffer = new float[RECORDING_LENGTH];
for (int i = 0; i < RECORDING_LENGTH; ++i) {
    floatInputBuffer[i] = (float) inputBuffer[i];
}

The form google cloud expects from predictions is (see data encoding section):

{"instances": [{"b64": "X5ad6u"}, {"b64": "IA9j4nx"}]}

So I put the audio into a map which mimics this.

  public static String convertToBase64Bytes(float[] audio) {
    ByteBuffer byteBuffer = ByteBuffer.allocate(4 * audio.length);
    for (int i = 0; i < audio.length; i++) {
      float amplitude = audio[i];
      byteBuffer.putFloat(amplitude);
    }
    byte[] data = byteBuffer.array();
    String rtn = Base64.encodeToString(data, Base64.DEFAULT);
    return rtn;
  }

  String audioByteString = convertToBase64Bytes(floatInputBuffer);
  final ArrayList<HashMap<String, String>> requestList = new ArrayList<>();
  HashMap<String, String> singleRequest = new HashMap<>();
  singleRequest.put("b64", audioByteString);
  requestList.add(singleRequest);
  HashMap<String, ArrayList<HashMap<String, String>>> jsonRequest = new HashMap<>();
  jsonRequest.put("instances", requestList);

I then call this function which sends the request and returns the result

public String sendRequest(HashMap<String, ArrayList<HashMap<String, String>>> jsonRequest) throws Exception {
    HttpContent content = new JsonHttpContent(new JacksonFactory(), jsonRequest);
    HttpRequest request = requestFactory.buildRequest(method.getHttpMethod(), url, content);
    return request.execute().parseAsString();
}

Inspecting the output from the model. The shape of the array is correct, but the float values are not. They are usually pretty much zero (e to the power of -26 or so).

On the model side the serving input function of the model (created using a custom tensorflow Estimator) which processes the request is

def serving_input_fn():
    feature_placeholders = {'b64': tf.placeholder(dtype=tf.string,
                                                  shape=[None],
                                                  name='source')}
    audio_samples = tf.decode_raw(feature_placeholders['b64'], tf.float32)
    inputs = {'inarray': audio_samples}
    return tf.estimator.export.ServingInputReceiver(inputs, feature_placeholders)

I think I am passing the encoded float array as a base64 string incorrectly, as google cloud should automatically decode the base64 string due to the "b64" key and this worked correctly when sending requests from Python.

Does anyone know how to send a float array to a model on google cloud from android in a way that it will be correctly decoded?

Shahin Vakilinia
  • 355
  • 1
  • 11
NickDGreg
  • 475
  • 3
  • 16

1 Answers1

2

It appears that this is a BytesOrder/endian-ness issue. From the ByteBuffer javadocs:

Primitive values are translated to (or from) sequences of bytes according to the buffer's current byte order, which may be retrieved and modified via the order methods. Specific byte orders are represented by instances of the ByteOrder class. The initial order of a byte buffer is always BIG_ENDIAN.

But TensorFlow's decode_raw defaults to little endian

little_endian: An optional bool. Defaults to True. Whether the input bytes are in little-endian order. Ignored for out_type values that are stored in a single byte like uint8.

The solution is to override one or the other defaults. Since ARM processors are natively big endian, perhaps stick with BigEndian in your Android code, and modify your TF code:

def serving_input_fn():
    feature_placeholders = {
        'audio_bytes': tf.placeholder(
            dtype=tf.string,
            shape=[None],
            name='source'
         )
    }
    audio_samples = tf.decode_raw(
        feature_placeholders['audio_bytes'],
        tf.float32,
        little_endian=False
    )
    return tf.estimator.export.ServingInputReceiver(
        feature_placeholders,
        feature_placeholders
    )

(I made a few other changes to the function as noted on a separate SO post)

rhaertel80
  • 8,254
  • 1
  • 31
  • 47
  • Works! I had no idea endianess was a thing, thanks a lot. Though I don't understand why feature_placeholders is given twice to the ServingInputReceiver instead of audio_samples, though I asked for clarification on that in the other post so no need to explain it here too. – NickDGreg Mar 09 '18 at 10:10