0

I know that it is possible to convert an audio to a representive image. Is someone know if the opposite way possible? Can we convert back the representive image to audio? If it's possible please tell me how.

I looked for ways to do this but I did not find.

Edit: my main goal is to generate new/random music using DCGAN. I thought to take an audio, convert is to the image of the freq graph, use DCGAN and the convert it back to audio.

I don't know wwich tool to use and how exacly to do this. If someone can help me it will be nice.

ido B
  • 1
  • 1
  • it looks like question for [DataScience](https://datascience.stackexchange.com/) – furas Mar 25 '22 at 00:27
  • Using spectograms is the best way? – ido B Mar 25 '22 at 09:31
  • I think you do not just want _any_ function turning any image to audio, but the inverse of the function they used for converting the audio to an image. For that, we will need to know first, which function you used for the conversion from audio to image. – Jonathan Scholbach Mar 25 '22 at 13:25

1 Answers1

0

there are many ways to do this ... the approach I used is to iterate across each pixel in the input image ... assign to each pixel in order a unique frequency ... the range of frequencies can be arbitrary lets vary it across the human audible range from 200 to 8,000 Hertz ... divide this audio freq range by the number of pixels which will give you a frequency increment value ... give the first pixel 200 Hertz and as you iterate across all pixels give each pixel a frequency by adding this freq increment value to the previous pixel's frequency

while you perform above iteration across all pixels determine the light intensity value of the current pixel and use this to determine a value normalize from zero to one which will be the amplification factor of the frequency of a given pixel

now you have a new array where each element records the light intensity value and a frequency ... walk across this array and create an oscillator to output a sin curve at an amplitude driven from the amplification factor at the frequency of the current array element ... now combine all such oscillator outputs and normalize into a single aggregate audio

this aggregate synthesized output audio is the time domain representation of the input image which is your frequency domain starting point

beautiful thing is this output audio is the inverse Fourier Transform of the image ... anyone fluent in Fourier Transform will predict what comes next namely this audio can then be sent into a FFT call which will output a new output image which if you implement all this correctly will match more or less to your original input image

I used golang not python however this challenge is language agnostic ... good luck and have fun

there are several refinements to this ... a naive way to parse the input image is to simply zig zag left to right top to bottom which will work however if you use a Hilbert Curve to determine which pixel comes next your output audio will be better suited to people listening especially when and if you change the image resolution of out original input image ... ignore this embellishment until you have it working

far more valuable than the code which implements this is the voyage of discovery endured in writing the code ... here is the video which inspired me to embark on this voyage https://www.youtube.com/watch?v=3s7h2MHQtxc # Hilbert's Curve: Is infinite math useful?

here is a sample input photo enter image description here

here is the output photo after converting above image into audio and then back into an image enter image description here

once you get this up and running and are able to toggle from frequency domain into the time domain and back again you are free to choose whether you start from audio or an image

Scott Stensland
  • 26,870
  • 12
  • 93
  • 104
  • can you share your code please? – ido B Mar 25 '22 at 09:29
  • I think they do not just want _any_ function turning any image to audio, but the inverse of the function they used for converting the audio to an image. – Jonathan Scholbach Mar 25 '22 at 13:24
  • Scott Stensland, I see that we loose alot of information this way. Jonathan.scholbach , wich funftion should I use for minimum loss information? – ido B Mar 25 '22 at 15:33