If you want a general solution to detecting Morse code, you are going to have to take a look at what it looks like as a waveform (tom10's link to this question should help here if you can install numpy
and matplotlib
; if not, you can use the stdlib's csv
module to export a file that you can use in your favorite spreadsheet program); work out how you as a human can distinguish dots, dashes, and spaces; turn that into an algorithm (a series of steps that even a literal-minded moron can follow); then turn that algorithm into code. Or you may be able to find a library that's already done this for you.
But for your specific case, you only need to detect exact copies of the contents of dot.wav
and dash.wav
within your larger file. (At least assuming you're not using any lossy compression, which usually you aren't in .wav files.) So, this is really just a substring search.
Think about how you'd detect the strings 'dot'
and 'dash'
within a string like 'dash dash dash dash dash dot dash dot dot dot dot dot '
. For such a simple problem, you could use a stupid brute-force algorithm, and it would be fine:
def find(haystack, needle, start):
for i in range(start, len(haystack)):
if haystack[i:i+len(needle)] == needle:
return i
return len(haystack)
def decode_morse(morse):
i = 0
while i < len(morse):
next_dot = find(morse, 'dot', i)
next_dash = find(morse, 'dash', i)
if next_dot < next_dash:
if next_dot < len(morse):
yield '.'
i = next_dot
else:
if next_dash < len(morse):
yield '-'
i = next_dash
Now, if you're searching a list of numbers instead of a string, how does this have to change? Barely at all; you can slice a list, compare two lists, etc. just like you can with strings.
The only real problem you'll run into is that you don't have the whole list in memory at once, just 20 frames at a time. What happens if a dot
starts in frame 19 and ends in frame 20? If your files aren't too big, this is easy to solve: just read all the frames into memory in one giant list, then search the whole thing. But otherwise, you have to do some buffering.
For example (ignoring error handling and dealing with the end of the file properly, and dealing only with dashes for simplicity—of course you have to do both of those properly in your real code):
buf = []
while True:
while len(buf) < 2*len(dash):
buf.extend(waveFile.readFrames(20))
next_dash = find(buf, dot)
if next_dash < len(buf):
yield '.'
buf = buf[next_dash:]
else:
buf = buf[-len(dash):]
We're making sure we always have at least two dash lengths in our buffer. And we always keep the leftover after the first dot or dash (if one was found) or a full dash length (if not) in the buffer, and add the next buffer to that. That's actually overkill; think it through and think through out exactly what you need to make sure we never miss a dash that falls between two buffers. But the point is, as long as you get that right, you can't miss any dots or dashes.