13

I encoded the wav file in base64 (audioClipName.txt in Resources/Sounds).

HERE IS THE SOURCE WAVE FILE

Then I tried to decode it, make an AudioClip from it and play it like this:

public static void CreateAudioClip()
{
    string s = Resources.Load<TextAsset> ("Sounds/audioClipName").text;

    byte[] bytes = System.Convert.FromBase64String (s);
    float[] f = ConvertByteToFloat(bytes);

    AudioClip audioClip = AudioClip.Create("testSound", f.Length, 2, 44100, false, false);
    audioClip.SetData(f, 0);

    AudioSource as = GameObject.FindObjectOfType<AudioSource> ();
    as.PlayOneShot (audioClip);
}

private static float[] ConvertByteToFloat(byte[] array) 
{
    float[] floatArr = new float[array.Length / 4];

    for (int i = 0; i < floatArr.Length; i++) 
    {
        if (BitConverter.IsLittleEndian) 
            Array.Reverse(array, i * 4, 4);

        floatArr[i] = BitConverter.ToSingle(array, i * 4);
    }

    return floatArr;
}

Every thing works fine, except the sound is just one noise.

I found this here on stack overflow, but the answer dosnt solve the problem.

Here are details about the wav file from Unity3D:

enter image description here

Does anyone know what the problem is here?

EDIT

I wrote down binary files, one just after decoding from base64, second after final converting, and compared it to the original binary wav file:

enter image description here

As you can see, file was encoded correctly cause just decoding it and writing the file down like this:

string scat = Resources.Load<TextAsset> ("Sounds/test").text;

byte[] bcat = System.Convert.FromBase64String (scat);
System.IO.File.WriteAllBytes ("Assets/just_decoded.wav", bcat);

gave same files. All files have some length.

But the final one is wrong, so the problem is somewhere in converting to float array. But I dont understand what could be wrong.

EDIT:

Here is the code for writing down the final.wav:

string scat = Resources.Load<TextAsset> ("Sounds/test").text;

byte[] bcat = System.Convert.FromBase64String (scat);
float[] f = ConvertByteToFloat(bcat);

byte[] byteArray = new byte[f.Length * 4];
Buffer.BlockCopy(f, 0, byteArray, 0, byteArray.Length);

System.IO.File.WriteAllBytes ("Assets/final.wav", byteArray);
Community
  • 1
  • 1
Jerry Switalski
  • 2,690
  • 1
  • 18
  • 36
  • Endian conversion is a pain. Can you bring in Jon Skeet's [miscutil](https://www.nuget.org/packages/JonSkeet.MiscUtil/) and try this conversion to see if things improve? `Enumerable.Range(0, array.Length/4).Select(i => EndianBitConverter.Big.ToSingle(array, i * 4)).ToArray()` – spender Feb 05 '16 at 16:41
  • @spender thanks. Unfortunetly this is Unity3D and I cannot use LINQ, anyway I could in Editor mode, but System.Linq.Enumerable.Range(0, array.Length / 4) returns IEnumerable that doesnt have Select method. I dont know why, as I am not very familiar with LINQ. – Jerry Switalski Feb 05 '16 at 17:11
  • I could use (System.Linq.Enumerable.Range(0, array.Length / 4) as System.Linq.Enumerable).Select(i => EndianBitConverter.Big.ToSingle(array, i * 4)).ToArray(); but I cannot find EndianBitConverter. – Jerry Switalski Feb 05 '16 at 17:14
  • @JoeBlow Hi there. I had many problems on mobile platforms with LINQ, Also I have read some features are not fully implemented on all platforms, like the sorting options on iOS. – Jerry Switalski Feb 05 '16 at 17:23
  • Interesting question here, good one – Fattie Feb 12 '16 at 22:30
  • So if you don't use base64 and open the wav file directly, it plays perfect? – ikwillem Feb 18 '16 at 12:12
  • @ikwillem yes, it is valid wave file. Also in Unity3D if I import that audio file works fine, but the problem was I cannot keep it in binary form, so I need it in text file encoded. – Jerry Switalski Feb 18 '16 at 12:44

4 Answers4

11

The wave file you try to play (meow.wav) has the following properties:

  • PCM
  • 2 channels
  • 44100 Hz
  • signed 16-bit little-endian

Your main mistake is, that you are interpreting the binary data as if it was already representing a float. This is, what BitConverter.ToSingle() does.

But what you need to do is, to create a signed 16-bit little-endian value (as specified in the Wavefile-header) from each two bytes, cast it to a float and then normalize it. And each two bytes make a sample in the case of your file (16-Bit!), not four bytes. The data is little endian (s16le), so you would only have to swap it if the host machine wasn't.

This would be the corrected conversion function:

private static float[] ConvertByteToFloat(byte[] array) {
    float[] floatArr = new float[array.Length / 2];

    for (int i = 0; i < floatArr.Length; i++) {
        floatArr[i] = ((float) BitConverter.ToInt16(array, i * 2))/32768.0;
    }

    return floatArr;
}

And you should skip over the header of your wave-file (The real audio data starts at offset 44).

For a clean solution, you would have to interpret the Wave-header correctly and adapt your operations according to what is specified there (or bail out if it contains unsupported parameters). For example the sample format (bits per sample and endianess), sample rate and number of channels must be taken care of.

Ctx
  • 18,090
  • 24
  • 36
  • 51
  • 1
    many thanks for your help! Aproved. Honestly I was sure Unity3D engine handles all this, but i tried removing RIFF header and play with 16 bits, but no luck. – Jerry Switalski Feb 17 '16 at 19:45
  • Can you elaborate why the number 32768.0? – Lolpez Jul 13 '21 at 22:03
  • 1
    @Lolpez A signed 16-bit value has a range from -32768 to 32767, so you get a number between -1.0 and 1.0 if you divide it by 32768. This is what is needed. – Ctx Jul 14 '21 at 08:15
1

According to the documentation here,

The samples should be floats ranging from -1.0f to 1.0f (exceeding these limits will lead to artifacts and undefined behaviour). The sample count is determined by the length of the float array. Use offsetSamples to write into a random position in the clip. If the length from the offset is longer than the clip length, the write will wrap around and write the remaining samples from the start of the clip.

it seems that you have exactly that effect. So I guess you will have to normalize the array before it can be processed.

As you are operating in unity, I am not sure what functionality you can use so I provided a little basic extension method for float arrays:

/// <summary>
/// Normalizes the values within this array.
/// </summary>
/// <param name="data">The array which holds the values to be normalized.</param>
static void Normalize(this float[] data)
{
    float max = float.MinValue;

    // Find maximum
    for (int i = 0; i < data.Length; i++)
    {
        if (Math.Abs(data[i]) > max)
        {
            max = Math.Abs(data[i]);
        }
    }

    // Divide all by max
    for (int i = 0; i < data.Length; i++)
    {
        data[i] = data[i] / max;
    }
}

Use this extension method before further processing the data like so:

byte[] bytes = System.Convert.FromBase64String (s);
float[] f = ConvertByteToFloat(bytes);

// Normalize the values before using them
f.Normalize();

AudioClip audioClip = AudioClip.Create("testSound", f.Length, 2, 44100, false, false);
audioClip.SetData(f, 0);
Markus Safar
  • 6,324
  • 5
  • 28
  • 44
  • 1
    Thanks Markus, I will check it working and let you know. Cheers – Jerry Switalski Feb 11 '16 at 18:00
  • 1
    I didnt work out, but maybe cause you are normalizing it only to 1f, not counting the -1f; I think we should use `if (Mathf.Abs(data[i]) > max)` EDIT, with absolut value it still is noise :/ – Jerry Switalski Feb 11 '16 at 18:09
  • I guess you are right :-) I will update my answer. Was it the problem? – Markus Safar Feb 11 '16 at 18:15
  • @JerrySwitalski I see... have you tried writing the data you get into a file and play it with a usual player? Maybe there is something wrong in the process before like failure in encoding to base64 or something else... + Did you convert the whole data from the file? Because I guess the method in unity just expects "raw data" without any header data. – Markus Safar Feb 11 '16 at 18:17
  • 1
    I will try all you suggest but on monday as i have holydays now. so see you later and thanks. BTW I think we should also assign to max the absolut value;) – Jerry Switalski Feb 11 '16 at 19:20
  • @JerrySwitalski True ;-) Ok, just let me know how it turns out ;-) – Markus Safar Feb 11 '16 at 19:24
  • @JerrySwitalski: How did you create the `final.wav`? Can you post the code for that as well, please? – Markus Safar Feb 16 '16 at 10:31
  • I updated the question, take a look. It is basicly converting back floats to bytes and write down. – Jerry Switalski Feb 16 '16 at 12:39
  • @JerrySwitalski To be honest I have no idea. But: Some more questions arise: Is the dump of the file you posted a complete `vorbis` file including the header or just samples? Can you maybe paste the file somewhere? + In the upper scenario: I am not sure that you can simply try to interpret "some" bytes as floats and afterwards play them. In my opinion (if the file is a `vorbis` file) the bytes represent compressed data (according to the vorbis file format specification) which needs to be decompressed first, afterwards the resulting "raw data bytes" could be transformed into floats and played. – Markus Safar Feb 16 '16 at 14:35
  • Yes it is vorbis, and I also triedd to remove RIFF header from byte array (44 bytes) and sound sounds better but still not even close like the original. – Jerry Switalski Feb 16 '16 at 15:11
  • @JerrySwitalski and have you tried using `Normalize` now? That should (in my opinion) fix any noise issues. – Markus Safar Feb 16 '16 at 15:13
  • yes I tried all combinations now, and nothing works. But as I said there was one result with something looking like sound not noise info. Also I will attach the wav file in 5 minutes to the question. – Jerry Switalski Feb 16 '16 at 16:46
1
using (var blockAlignedStream = new BlockAlignReductionStream(WaveFormatConversionStream.CreatePcmStream(new RawSourceWaveStream(ms, new WaveFormat(22050, 16, 1)))))
{
    var aggregator = new SampleAggregator(blockAlignedStream.ToSampleProvider());
    aggregator.NotificationCount = blockAlignedStream.WaveFormat.SampleRate / 50;
    using (var wo = new WaveOutEvent())
    {
        isPlaying = true;
        wo.Init(aggregator);
        wo.Play();

        while (wo.PlaybackState == PlaybackState.Playing) //&& !disconnected
        {
            yield return new WaitForEndOfFrame();
        }
        wo.Dispose();
        aggregator.Reset();
        aggregator = null;
    }
}

@fibriZo: The code example for my comment above. The streamed Audio date is added to a MemoryStream ( ms ) and read in a corutine by NAudio

bomanden
  • 314
  • 1
  • 2
  • 16
0

An implementation of Ctx's solution:

PcmHeader

private readonly struct PcmHeader
{
    #region Public types & data

    public int    BitDepth         { get; }
    public int    AudioSampleSize  { get; }
    public int    AudioSampleCount { get; }
    public ushort Channels         { get; }
    public int    SampleRate       { get; }
    public int    AudioStartIndex  { get; }
    public int    ByteRate         { get; }
    public ushort BlockAlign       { get; }

    #endregion

    #region Constructors & Finalizer

    private PcmHeader(int bitDepth,
        int               audioSize,
        int               audioStartIndex,
        ushort            channels,
        int               sampleRate,
        int               byteRate,
        ushort            blockAlign)
    {
        BitDepth       = bitDepth;
        _negativeDepth = Mathf.Pow(2f, BitDepth - 1f);
        _positiveDepth = _negativeDepth - 1f;

        AudioSampleSize  = bitDepth / 8;
        AudioSampleCount = Mathf.FloorToInt(audioSize / (float)AudioSampleSize);
        AudioStartIndex  = audioStartIndex;

        Channels   = channels;
        SampleRate = sampleRate;
        ByteRate   = byteRate;
        BlockAlign = blockAlign;
    }

    #endregion

    #region Public Methods

    public static PcmHeader FromBytes(byte[] pcmBytes)
    {
        using var memoryStream = new MemoryStream(pcmBytes);
        return FromStream(memoryStream);
    }

    public static PcmHeader FromStream(Stream pcmStream)
    {
        pcmStream.Position = SizeIndex;
        using BinaryReader reader = new BinaryReader(pcmStream);

        int    headerSize      = reader.ReadInt32();  // 16
        ushort audioFormatCode = reader.ReadUInt16(); // 20

        string audioFormat = GetAudioFormatFromCode(audioFormatCode);
        if (audioFormatCode != 1 && audioFormatCode == 65534)
        {
            // Only uncompressed PCM wav files are supported.
            throw new ArgumentOutOfRangeException(nameof(pcmStream),
                                                  $"Detected format code '{audioFormatCode}' {audioFormat}, but only PCM and WaveFormatExtensible uncompressed formats are currently supported.");
        }

        ushort channelCount = reader.ReadUInt16(); // 22
        int    sampleRate   = reader.ReadInt32();  // 24
        int    byteRate     = reader.ReadInt32();  // 28
        ushort blockAlign   = reader.ReadUInt16(); // 32
        ushort bitDepth     = reader.ReadUInt16(); //34

        pcmStream.Position = SizeIndex + headerSize + 2 * sizeof(int); // Header end index
        int audioSize = reader.ReadInt32();                            // Audio size index

        return new PcmHeader(bitDepth, audioSize, (int)pcmStream.Position, channelCount, sampleRate, byteRate, blockAlign); // audio start index
    }

    public float NormalizeSample(float rawSample)
    {
        float sampleDepth = rawSample < 0 ? _negativeDepth : _positiveDepth;
        return rawSample / sampleDepth;
    }

    #endregion

    #region Private Methods

    private static string GetAudioFormatFromCode(ushort code)
    {
        switch (code)
        {
            case 1:     return "PCM";
            case 2:     return "ADPCM";
            case 3:     return "IEEE";
            case 7:     return "?-law";
            case 65534: return "WaveFormatExtensible";
            default:    throw new ArgumentOutOfRangeException(nameof(code), code, "Unknown wav code format.");
        }
    }

    #endregion

    #region Private types & Data

    private const int SizeIndex = 16;

    private readonly float _positiveDepth;
    private readonly float _negativeDepth;

    #endregion
}

PcmData

private readonly struct PcmData
{
    #region Public types & data

    public float[] Value      { get; }
    public int     Length     { get; }
    public int     Channels   { get; }
    public int     SampleRate { get; }

    #endregion

    #region Constructors & Finalizer

    private PcmData(float[] value, int channels, int sampleRate)
    {
        Value      = value;
        Length     = value.Length;
        Channels   = channels;
        SampleRate = sampleRate;
    }

    #endregion

    #region Public Methods

    public static PcmData FromBytes(byte[] bytes)
    {
        if (bytes == null)
        {
            throw new ArgumentNullException(nameof(bytes));
        }

        PcmHeader pcmHeader = PcmHeader.FromBytes(bytes);
        if (pcmHeader.BitDepth != 16 && pcmHeader.BitDepth != 32 && pcmHeader.BitDepth != 8)
        {
            throw new ArgumentOutOfRangeException(nameof(pcmHeader.BitDepth), pcmHeader.BitDepth, "Supported values are: 8, 16, 32");
        }

        float[] samples = new float[pcmHeader.AudioSampleCount];
        for (int i = 0; i < samples.Length; ++i)
        {
            int   byteIndex = pcmHeader.AudioStartIndex + i * pcmHeader.AudioSampleSize;
            float rawSample;
            switch (pcmHeader.BitDepth)
            {
                case 8:
                    rawSample = bytes[byteIndex];
                    break;

                case 16:
                    rawSample = BitConverter.ToInt16(bytes, byteIndex);
                    break;

                case 32:
                    rawSample = BitConverter.ToInt32(bytes, byteIndex);
                    break;

                default: throw new ArgumentOutOfRangeException(nameof(pcmHeader.BitDepth), pcmHeader.BitDepth, "Supported values are: 8, 16, 32");
            }

            samples[i] = pcmHeader.NormalizeSample(rawSample); // normalize sample between [-1f, 1f]
        }

        return new PcmData(samples, pcmHeader.Channels, pcmHeader.SampleRate);
    }

    #endregion
}

Usage

public static AudioClip FromPcmBytes(byte[] bytes, string clipName = "pcm")
{
    clipName.ThrowIfNullOrWhitespace(nameof(clipName));
    var pcmData   = PcmData.FromBytes(bytes);
    var audioClip = AudioClip.Create(clipName, pcmData.Length, pcmData.Channels, pcmData.SampleRate, false);
    audioClip.SetData(pcmData.Value, 0);
    return audioClip;
}

Note that AudioClip.Create provides an overload with Read and SetPosition callbacks in case you need to work with a source Stream instead of a chunk of bytes.

fibriZo raZiel
  • 894
  • 11
  • 10
  • Hi would I be able to use your code above for streaming data ? I'm receiving a audio stream from IBM Watson TTS, at the moment its working great with NAudio , but if I wanted it to be compatible with the different lipsync plugins like Salse, Oculus lipsync my quess is a audioclip would be better. Any examples with streamed audio data ? – bomanden Sep 25 '22 at 11:52
  • As you can see, this code loads the full file as AudioClip from RAM, instead of hard disk. Therefore, you won't be able to use it as it is with Stream or partially loaded (streamed) data. – fibriZo raZiel Dec 05 '22 at 15:41