I have created a dummy model which returns the array input to it and deployed it on google-cloud ML-engine so I can check how it is decoding the audio sent in the request. I have been unable to send audio stored in a float array from an android app to the model in a way that it is correctly decoded. Though I have no problems when sending a request from Python.
I make the request as follows: Audio is recorded into
short[] inputBuffer = new short[RECORDING_LENGTH];
Converted to a float array
float[] floatInputBuffer = new float[RECORDING_LENGTH];
for (int i = 0; i < RECORDING_LENGTH; ++i) {
floatInputBuffer[i] = (float) inputBuffer[i];
}
The form google cloud expects from predictions is (see data encoding section):
{"instances": [{"b64": "X5ad6u"}, {"b64": "IA9j4nx"}]}
So I put the audio into a map which mimics this.
public static String convertToBase64Bytes(float[] audio) {
ByteBuffer byteBuffer = ByteBuffer.allocate(4 * audio.length);
for (int i = 0; i < audio.length; i++) {
float amplitude = audio[i];
byteBuffer.putFloat(amplitude);
}
byte[] data = byteBuffer.array();
String rtn = Base64.encodeToString(data, Base64.DEFAULT);
return rtn;
}
String audioByteString = convertToBase64Bytes(floatInputBuffer);
final ArrayList<HashMap<String, String>> requestList = new ArrayList<>();
HashMap<String, String> singleRequest = new HashMap<>();
singleRequest.put("b64", audioByteString);
requestList.add(singleRequest);
HashMap<String, ArrayList<HashMap<String, String>>> jsonRequest = new HashMap<>();
jsonRequest.put("instances", requestList);
I then call this function which sends the request and returns the result
public String sendRequest(HashMap<String, ArrayList<HashMap<String, String>>> jsonRequest) throws Exception {
HttpContent content = new JsonHttpContent(new JacksonFactory(), jsonRequest);
HttpRequest request = requestFactory.buildRequest(method.getHttpMethod(), url, content);
return request.execute().parseAsString();
}
Inspecting the output from the model. The shape of the array is correct, but the float values are not. They are usually pretty much zero (e to the power of -26 or so).
On the model side the serving input function of the model (created using a custom tensorflow Estimator) which processes the request is
def serving_input_fn():
feature_placeholders = {'b64': tf.placeholder(dtype=tf.string,
shape=[None],
name='source')}
audio_samples = tf.decode_raw(feature_placeholders['b64'], tf.float32)
inputs = {'inarray': audio_samples}
return tf.estimator.export.ServingInputReceiver(inputs, feature_placeholders)
I think I am passing the encoded float array as a base64 string incorrectly, as google cloud should automatically decode the base64 string due to the "b64" key and this worked correctly when sending requests from Python.
Does anyone know how to send a float array to a model on google cloud from android in a way that it will be correctly decoded?