1

I have been doing a project on encrypting .wav files using RSA algo and for that, I need to remove the file header to read the file properly. I need the sound data as a numpy array. Now I've searched the web for this and didn't understand what is the file header and how to remove it in python3. Looking forward to suggestions. Thank you.

Sandipan
  • 683
  • 8
  • 25
  • One option would be looking up the specifications, read the file as binary file and `.seek()` to the point where the data begins. EDIT: Looks like the offset needed would be `44` – Vulpex Mar 29 '19 at 15:13

1 Answers1

5
binarySound = bytearray()
binaryHeader = bytearray()

with open("a2002011001-e02.wav",'rb') as f:
        binaryHeader = f.read(44)
        binarySound = f.read()

This should be what you're looking for. This will read the first 44 bytes (supposedly the header) into the binaryHeader variable and the rest sound data into the binarySound variable.

To get your music file back you can simply add those two files back together

song = bytearray()

with open("header.bin","rb") as h:
        song = h.read()
        with open("data.bin","rb") as d:
                song += d.read()

with open("new.wav","wb") as f:
        f.write(song)

EDIT: To include the edit in OP for the need of a numpy array:

import numpy

binarySound = {}
binaryHeader = {}

song = {}

with open("a2002011001-e02.wav",'rb') as f:
        buffer = f.read(44)
        binaryHeader = numpy.frombuffer(buffer,dtype=numpy.uint8)
        buffer = f.read()
        binarySound = numpy.frombuffer(buffer,dtype=numpy.uint8)

with open("header.bin","wb") as f:
        f.write(binaryHeader)

with open("data.bin","wb") as f:
        f.write(binarySound)

with open("header.bin","rb") as h:
        song = h.read()
        with open("data.bin","rb") as d:
                song += d.read()

with open("new.wav","wb") as f:
        song = numpy.array(song)
        f.write(song.tobytes())
Vulpex
  • 1,041
  • 8
  • 19
  • Yea but i need that sound data in a integer numpy array – Sandipan Mar 29 '19 at 15:35
  • @Sandipan you didn't specify in question. `numpy.frombuffer` seems to be the solution. – Vulpex Mar 29 '19 at 15:40
  • Thank you for this – Sandipan Mar 29 '19 at 15:42
  • @Sandipan updated the Answer to include a version with a numpy array. I think this was what you're looking for. If not please let me know to keep the answer as accurate as possible – Vulpex Mar 29 '19 at 16:10
  • It is working for mono file as its generate 1d array but in case of stereo file 2d array is desired and this method cannot work. Can you help me with this ? – Sandipan Mar 31 '19 at 05:39