0

I want to play the output of android's TextToSpeech.synthesizeToFile using AudioTrack but fear I'm passing AudioTrack's builders the wrong parameters. I copied one of the files generated with TextToSpeech.synthesizeToFile using adb and put in in this github file if you want to see TextToSpeech.synthesizeToFile's output for yourself. When I run play tempSoundFile8290688667049742717.wav in linux, the file plays the text I wrote (hello world), and prints the following:

play WARN alsa: can't encode 0-bit Unknown or not applicable

tempSoundFile8290688667049742717.wav:

 File Size: 39.4k     Bit Rate: 353k
  Encoding: Signed PCM    
  Channels: 1 @ 16-bit   
Samplerate: 22050Hz      
Replaygain: off         
  Duration: 00:00:00.89  

In:100%  00:00:00.89 [00:00:00.00] Out:19.7k [!=====|=====!]        Clip:0    
Done.

accordingly, I'm setting AudioTrack's parameters as follows:

private AudioDeviceInfo findAudioDevice(int deviceFlag, int deviceType) {
    AudioManager manager = (AudioManager) this.context.getSystemService(Context.AUDIO_SERVICE);
    AudioDeviceInfo[] adis = manager.getDevices(deviceFlag);
    for (AudioDeviceInfo adi : adis) {
        if (adi.getType() == deviceType) {
            return adi;
        }
    }
    return null;
}

AudioDeviceInfo mAudioOutputDevice = findAudioDevice(AudioManager.GET_DEVICES_OUTPUTS,
        AudioDeviceInfo.TYPE_BUS);

AudioAttributes.Builder audioAttributesBuilder = new AudioAttributes.Builder().
        setUsage(AudioAttributes.USAGE_VOICE_COMMUNICATION_SIGNALLING).
        setContentType(AudioAttributes.CONTENT_TYPE_SPEECH).
        setFlags(AudioAttributes.FLAG_AUDIBILITY_ENFORCED);
attributes = audioAttributesBuilder.build();

int minBufferSize = AudioTrack.getMinBufferSize(22050, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT);

AudioTrack.Builder atBuilder = new AudioTrack.Builder();
//builder.setAudioAttributes()
AudioFormat.Builder afBuilder = new AudioFormat.Builder();

afBuilder.setEncoding(AudioFormat.ENCODING_PCM_16BIT)
        .setChannelMask(AudioFormat.CHANNEL_OUT_MONO)
        .setSampleRate(22050);

atBuilder.setAudioFormat(afBuilder.build())
        .setTransferMode(AudioTrack.MODE_STREAM)
        .setBufferSizeInBytes(minBufferSize)
        .setAudioAttributes(attributes);


at = atBuilder.build();
at.setPreferredDevice(mAudioOutputDevice);

File myFile = File.createTempFile("tempSoundFile", ".wav");
            myFile.deleteOnExit();
            myFile.setWritable(true);
            myFile.setReadable(true);

and later plays the file, using code from here:

/**
* Code taken from here:
* https://stackoverflow.com/questions/7372813/android-audiotrack-playing-wav-file-getting-only-white-noise
*/
private void playWav(){
    Log.d(TAG, "Playing speech to text wav file");
    String filepath = this.myFile.getAbsolutePath();

    int i = 0;
    int BUFFER_SIZE = 512;
    byte[] s = new byte[BUFFER_SIZE];
    try {
        Log.i(TAG, "file path is: " + filepath);
        FileInputStream fin = new FileInputStream(filepath);
        DataInputStream dis = new DataInputStream(fin);


        at.play();

        while((i = dis.read(s, 0, BUFFER_SIZE)) > -1){
            at.write(s, 0, i);
            Log.v(TAG, Arrays.toString(s));

        }
        at.stop();
        at.release();
        dis.close();
        fin.close();

    } catch (FileNotFoundException e) {
        // TODO
        e.printStackTrace();
    } catch (IOException e) {
        // TODO
        e.printStackTrace();
    }
}

Of course, calls to these are spread across different async calls as you can see in my code, but I've debugged all that with log statements and the debugger and don't see any issues. playWav() is getting hit when I expect, but not playing anything.

edit:

My primary motivation for using AudioTrack is to make it TextToSpeech compatible with the raspberry pi voice kit's android things library. Using AudioTrack will allow me to play textToSpeech over I2S (or any speaker I choose).

edit 2, a deeper look:

According to this website, wav files have a 44 byte header that tells what all these parameters are. In this header, at:

  • position 20, 2 bytes that dictate the file type (little endian) (16 for PCM)
  • position 22, 2 bytes that dictate the number of channels (1 for Mono, 2 for Stereo) (little endian)
  • position 24, 4 bytes that dictate the sample rate (little endian)
  • and finally at position 34, 2 bytes dictate the bits per sample (little endian)

here is a hex dump of the above mentioned file:

$ hd -n 44 tempSoundFile8290688667049742717.wav 
00000000  52 49 46 46 f8 99 00 00  57 41 56 45 66 6d 74 20  |RIFF....WAVEfmt |
00000010  10 00 00 00 01 00 01 00  22 56 00 00 44 ac 00 00  |........"V..D...|
00000020  02 00 10 00 64 61 74 61  d4 99 00 00              |....data....|
0000002c
mikeLundquist
  • 769
  • 1
  • 12
  • 26
  • Looks like you're kinda blowing past the WAV header, but that doesn't explain the silence. I assume [`SoundPool`](https://developer.android.com/reference/android/media/SoundPool) or `MediaPlayer` won't work for your use case? Might want to test out your hardware setup by just writing out (repeatedly/infinitely) a square wave, instead of the results of `dis.read()`, e.g. `short[] wavecycle={-0x7FFF, -0x7FFF, -0x7FFF, -0x7FFF, 0x7FFF, 0x7FFF, 0x7FFF, 0x7FFF}`. Also it's likely `stop()` is being called before the sound is finished playing, b/c it's called as soon as the last write completes. – greeble31 Jan 16 '19 at 05:22
  • I need the __setPreferredDevice__ capability. MediaPlayer [has it in API level 28](https://developer.android.com/reference/android/media/MediaPlayer.html#setPreferredDevice(android.media.AudioDeviceInfo)), but android things is level 27. I didn't see it in SoundPool. I know the hardware works, because the assistant portion of my code works (if you look at the link to my github). – mikeLundquist Jan 16 '19 at 13:28
  • My point about the WAV header is that you're not supposed to send that to `AudioTrack`, (as mentioned in the question you linked). `AudioTrack.write()` does not interpret header information; it regards everything you give it as PCM data. This will cause a very brief burst of noise at the beginning of playback. But that's just an aside, let me look at your github for a minute... – greeble31 Jan 16 '19 at 15:46
  • What happens if you try it without the `setAudioAttributes(attributes)` call? What you want to do is set it up as closely as possible to the `AudioTrack` you have that is working -- just a different sample rate. I'm not sure, but I think either `USAGE_VOICE_COMMUNICATION_SIGNALLING` or `CONTENT_TYPE_SPEECH` is causing a different device to be used. Besides, you don't need them anyway; a sound is a sound, whether it's voice or music. An `AudioTrack` doesn't interpret PCM data differently based on whether you configured it for voice or not. – greeble31 Jan 16 '19 at 15:59
  • I tried it without setting attributes, it's still not working. I added the `mDac.setSdMode(Max98357A.SD_MODE_LEFT);` before calling playWav() and `mDac.setMode(Max98357A.SD_MODE_SHUTDOWN);` after. This made the microphone click which is what the working code does, but it doesn't play the wav file. I also tried hooking up an osiliscope to the mic leads, which shows no signal other than the click. When I Log the data sent to .write, I can see it, but it don't think it's making it to the mic. I also tried adding a sleep after the write to no effect. – mikeLundquist Jan 16 '19 at 16:36
  • Just tried this code myself on the emulator, and I can verify it works. With or without the `attributes`. Of course, I'm not using your specialized audio hardware. – greeble31 Jan 16 '19 at 17:16
  • I think the thing you need to focus on is that you already have a 16-bit mono `AudioTrack` that works. So, arguably, if you just found a way to get your WAV data into _that_ `AudioTrack`, it would be certain to play (albeit at the wrong sample rate). – greeble31 Jan 16 '19 at 17:18
  • You should probably check the return value from `write()` just to be safe. – greeble31 Jan 16 '19 at 17:21
  • 1
    One other thing -- and I'm just spitballing, here -- but I'm not confident that Max98357A chip can handle two different sample rates at the same time. You're not properly cleaning up your 16KHz `AudioTrack`, and if you try to make a 22.5KHz one at the same time, it could get confused. As a matter of fact, looking at this [datasheet](https://datasheets.maximintegrated.com/en/ds/MAX98357A-MAX98357B.pdf), it only supports 8/16/48/96 KHz! – greeble31 Jan 16 '19 at 17:44
  • I'm going to try switching to [a cloud TTS engine](https://cloud.google.com/text-to-speech/docs/reference/libraries#client-libraries-install-java) that can set the sample rate. If that works, I'll lett you know and accept your answer if you post it. – mikeLundquist Jan 16 '19 at 18:00
  • Sure. Consider just changing the 22050 to 16000. It'll make your female voice sound like a dude, but otherwise it should still work. – greeble31 Jan 16 '19 at 18:08
  • something like [this](https://github.com/ashqal/android-libresample)? – mikeLundquist Jan 16 '19 at 18:33
  • Well, yes, that looks like it would correct the pitch... but I'm not sure you completely understood my last comment - I meant, literally just change the numbers in your source code, from 22050 to 16000 (2 places). – greeble31 Jan 16 '19 at 19:13
  • Ok, so I can play audio sampled at 22050 at 16000 and it will simply change the tone? – mikeLundquist Jan 16 '19 at 19:16
  • That's exactly it. – greeble31 Jan 16 '19 at 19:16
  • I figured out the problem, switching to `at.write(s, 0, BUFFER_SIZE, AudioTrack.WRITE_BLOCKING);` fixed it. – mikeLundquist Jan 16 '19 at 19:39
  • Hmm. Strange, but if it works, it works! – greeble31 Jan 16 '19 at 19:51

0 Answers0