2

I have a .wav file that has 2 types of sounds: Long and short. What I need to do is I need to encode them as bits and write them to a binary file.

I got the code from this SO answer: https://stackoverflow.com/a/53309191/2588339 and using it I get this plot for my input wav file:

frequencies

As you can see, there are shorter and wider parts in the first plot as for the shorter and longer sounds in my file.

My question is how can I encode each one of the sounds as a bit? Like having each long sound in the file represent a 1 and a short sound represent a 0.

EDIT: The 2 types of sound differ by how long they play and by frequency also. The longer sound is also lower frequency and the shorter sound is also higher frequency. You can find a sample of the file here: https://vocaroo.com/i/s0A1weOF3I3f

Daniel Bejan
  • 1,468
  • 1
  • 15
  • 39
  • 1
    what's there between 2 sounds ? Are they all zero ? If so, you can measure the time the signal is different from `0`, then you would know whether it is a 'long' or 'short' sound. – AcaNg Jun 27 '19 at 13:58
  • Moreover, the signal can temporarily be `0` while it's sounding, but it can't be `0` for too long. So just measure the time it being `0` and check if it exceeds some epsilon value to determine whether it stopped sounding or not. – AcaNg Jun 27 '19 at 14:05
  • 1
    @Tiendung I added an edit explaining the difference between the 2 sounds – Daniel Bejan Jun 27 '19 at 17:09

1 Answers1

2

Measuring the loudness of each frequency by taking the FFT of the signal is the more "scientific" way to do it, but the image of the raw signal indicates it should be possible to get away much easier than that.

If you take a sliding window (at least as wide as 1 period of the primary frequency of the sound (~300Hz)) and find the maximum value within that window, it should be fairly easy to apply a threshold to determine if the tone is playing at a given time interval or not. Here's a quick article I found on rolling window functions.

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

window_size = sample_rate / primary_freq #minimum size window. could be larger.
rolling_max = np.max(rolling_window(wav_data, window_size),-1)
threshold_max = rolling_max > threshold # maybe about 1000ish based on your graph

Then simply determine the length of the runs of True in threshold_max. Again, I'll pull on the community from this answer showing a concise way to get the run length of an array (or other iterable).

def runs_of_ones(bits):
  for bit, group in itertools.groupby(bits):
    if bit: yield sum(group)

run_lengths = list(runs_of_ones(threshold_max))

The values in run_lengths should now be the length of each "on" pulse of sound in # of samples. It should now be relatively straightforward for you to test each value if it's long or short and write to a file.

Aaron
  • 10,133
  • 1
  • 24
  • 40
  • Regarding the different frequencies: the only modification to make would be to ensure that the window is at least as long as the period of the lower of the two frequencies. – Aaron Jun 27 '19 at 17:15
  • 1
    Although at first I didn't get what you were trying to say, I studied a bit more on the subject and got my analyser up and running.. I decoded the wav file by putting a logic `1` or a logic `0` depending on the length of the sound and then writing all the bits to a binary file. Ended up having getting a `PNG` file with the password for the next level – Daniel Bejan Jul 01 '19 at 11:11