PyAudio 'utf8' error when listing devices

Question

When using PyAudio (Portaudio binding) with ASIO+DirectSound support, this code :

import pyaudio

p = pyaudio.PyAudio()
for i in range(p.get_device_count()):
    print p.get_device_info_by_index(i)

... produces this error :

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 1: invalid continuation byte

How can we solve this problem ?

The problem may come from "pyaudio.py", line 990, because of an unsucessful utf8 decoding :

           return {'index' : index,
                    'structVersion' : device_info.structVersion,
                    'name' : device_info.name,

This answer here Special characters in audio devices name : Pyaudio ("don't use PyAudio") is not satisfactory.

Traceback

...
{'defaultSampleRate': 44100.0, 'defaultLowOutputLatency': 0.0, 'defaultLowInputLatency': 0.12, 'maxInputChannels': 2L, 'structVersion': 2L, 'hostApi': 1L, 'index': 8, 'defaultHighOutputLatency': 0.0, 'maxOutputChannels': 0L, 'name': u'Microphone interne (Conexant 20672 SmartAudio HD)', 'defaultHighInputLatency': 0.24}
Traceback (most recent call last):
  File "D:\test\test.py", line 5, in <module>
    print p.get_device_info_by_index(i)
  File "C:\ProgramData\Anaconda\lib\site-packages\pyaudio.py", line 977, in get_device_info_by_index
    pa.get_device_info(device_index)
  File "C:\ProgramData\Anaconda\lib\site-packages\pyaudio.py", line 990, in _make_device_info_dictionary
    'name' : device_info.name,
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 1: invalid continuation byte

The traceback doesn't match your code, but the error does look internal to pyaudio. Have you tried filing a bug report with them? — Wooble, Feb 05 '14 at 13:23
Sorry, I pasted a traceback coming from another file where I had the same problem; now I replaced with the Traceback from the code that I gave here in my question. — Basj, Feb 05 '14 at 13:26
Yes the error is internal to PyAudio, the goal is here to find where the bug is. — Basj, Feb 05 '14 at 13:28
Two things: 1. I cannot reproduce this error on Linux, but of course I also do not have the specific sound card (ASIO). 2. The code in the line of the trace does not contain any hint on why a decoding is attempted. It is only building a dict there. That's strange. — Alfe, Feb 05 '14 at 15:16
`device_info.name` is a property. `PyUnicode_FromString(PaDeviceInfo.name)` is called when accessing the property. `PaDeviceInfo.name` is supposed to be UTF8 but apparently in this case it is not. — cgohlke, Feb 05 '14 at 17:36
Some patches were posted to the PortAudio mailing list last year which fix some UTF8 device name issues on Windows. (search for "Unicode problems with Windows-build.") They are in the queue to be merged. — Ross Bencina, Feb 05 '14 at 18:51
@RossBencina do you have an idea of how I can solve this for PyAudio as well? I'm a bit new to this (I'm sill unable to compile Portaudio+PyAudio with ASIO+DS at the same time, there are too many steps, and I got too many things to troubleshoot ;) ) — Basj, Feb 05 '14 at 21:37
`UnicodeDecodeError` is sadly quite common in Python. Following answer may apply to your specific case: http://stackoverflow.com/a/19706723/2419207 — iljau, Feb 14 '14 at 01:36
Also I'd suggest to tag your post with [`python-2.x`](https://stackoverflow.com/tags/python-2.x) or [`python3.x`](https://stackoverflow.com/tags/python-3.x) depending on what version you are using. As handling encodings has changed a lot in Python 3: http://docs.python.org/3/howto/pyporting.html#str-unicode and http://docs.python.org/3.0/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit — iljau, Feb 14 '14 at 01:42

score 5 · Answer 1 · answered Feb 08 '14 at 06:40

The error 'invalid continuation byte' makes me think that the text is corrupt for that particular index.

If you're able to modify the pyaudio.py file (or get the pyaudio.py file to return just the name), you might be able to try handle the UTF-8 decoding yourself by using 'Unicode Dammit'. It pretty much takes a best guess at what the encoding can be. Here's a link to their tutorial (http://www.crummy.com/software/BeautifulSoup/bs4/doc/#unicode-dammit)

I think the code would look just like the tutorial:

from bs4 import UnicodeDammit

dammit = UnicodeDammit(audiodevicename)
print(dammit.unicode_markup) ## Wéird Device Name!

Hell I have no idea if that answer is going to help OP, but +1 for mentioning this library I had no idea about. Nice! — Fenikso, Feb 09 '14 at 19:05

score 1 · Answer 2 · answered Feb 14 '14 at 13:25

I've forked pyAudio and modified https://github.com/joelewis/PyAudio/blob/master/src/_portaudiomodule.c code to use

PyUnicode_DecodeFSDefault

instead of

 PyUnicode_FromString

which likely might solve the unicode issue. See if you could find it helpful.

fork: https://github.com/joelewis/PyAudio/

score 1 · Accepted Answer · answered Feb 18 '14 at 08:44

The only successful solution found is :

apply Tobias Erichsen'patch to PortAudio (as mentioned in @RossBencina's comment) that can be found here : https://www.assembla.com/spaces/portaudio/support/tickets/224-patch-for-windows-directsound-and-wmme-utf-8-device-names#/activity/ticket
rebuild the whole thing

Many thanks to @cgohlke for having built new ready-to-use installers : http://www.lfd.uci.edu/~gohlke/pythonlibs/#pyaudio

score 0 · Answer 4 · answered Sep 08 '16 at 15:14

I think the clue here is

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 1: invalid continuation byte

For whatever reason something returned by get_device_info_by_index() (probably the name field) contains the byte 0xe9 which, if you are interpreting the string of bytes as UTF8, signifies a "continuation byte". This means that it expects some valid bytes to follow the 0xe9. valid bytes means some sequence of bytes that constitutes a legitimate UTF8 character. E.g.

http://hexutf8.com/?q=e981a8

uses 0xe9 with some valid continuation bytes.

PyAudio 'utf8' error when listing devices

4 Answers4