I'm trying to extract a directory name from a .PLM file using Python 2.7 on Windows 10. A .PLM file is a proprietary file format used for Panasonic voice recorders, which stores the name of the directory for the voice recordings.
(example: say I have a voice recording, which I'd like to save in the folder "HelloÆØÅ", then this voice recorder creates a folder called "SV_VC001" and a file called "SD_VOICE.PLM" which, among a bunch of other data, stores the string "HelloÆØÅ")
Now, I'm a Dane, and so use the characters Æ, Ø and Å, which aren't supported by ascii, so I have to convert this binary data into unicode.
So far I know that the name of the directory is stored from byte 56 and onward, and terminates with a byte of all 0's. For example, one recording is stored in a directory called "2-3-15 Årstids kredsløbet michael", which has the hex-values:
322d 332d 3135 20c5 7274 6964 7320 6b72
6564 736c f862 6574 206d 6963 6861 656c
This is the code I'm using thus far:
# Finds the filename in the .PLM-file
def FindFileName(File):
# Opens the file and points to byte 56, where the file name starts
f = open(File,'rb')
f.seek(56)
Name = ""
byte = f.read(1) # Reads the first byte after byte 56
while byte != "\x00": # Runs the loop, until a NUL-character is found (00 is NUL in hex)
Name += str(byte) # Appends the current byte to the string Name
byte = f.read(1) # reads the next byte
f.close()
return Name
And this works - provided the directory name only uses ASCII characters (so no 'æ', 'ø' or 'å').
However, if there are unicode characters in the string, then this is converted to some other character. With the directory "2-3-15 Årstids kredsløbet michael", this program outputs "2-3-15 ┼rtids kredsl°bet michael"
Do you have any suggestions?
Thank you very much in advance.
EDIT
Adding the suggestions from Mark Ransom, the code is as follows. I also tried clumsily to handle the 3 edge cases found: question marks are changed to spaces, and \xc5 and \xd8 (Å and Ø in hex, respectively) are changed to å and ø respectively.
def FindFileName(File):
# Opens the file and points to byte 56, where the file name starts
f = open(File,'rb')
f.seek(56)
Name = ""
byte = f.read(1) # Reads the first byte after byte 56
while byte and (byte != "\x00"): # Runs the loop, until a NUL-character is found (00 is NUL in hex)
# Since there are problems with "?" in directory names, we change those to spaces
if byte == "?":
Name += " "
elif byte == "\xc5":
Name += "å"
elif byte == "\xd8":
Name += "ø"
else:
Name += byte
byte = f.read(1) # reads the next byte
f.close()
return Name.decode('mbcs')
Which produces the following error for uppercase Æ, Ø and Å:
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: u'C:\\Users\\UserName\\Desktop\\TestDir\\Mapper\\13-10*14 ESSOTERISK \xc5NDSSTR\xd8MNIN'
The string should be "13-10*14 ESSOTERISK ÅNDSSTRØMNIN", but Å and Ø (hex c5 and d8) are throwing errors.