0

I want to stream audio from my microphone with python (on linux). I used the PyAlsaAudio module, but I got stuck.

My code so far:

import alsaaudio

CHAN = 1
RATE = 44400
PERIOD = RATE * 1

inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE,
                    alsaaudio.PCM_NORMAL,
                    channels = CHAN,
                    rate = RATE,
                    format = alsaaudio.PCM_FORMAT_MPEG,
                    periodsize = PERIOD
                    )
wf = open('stream.mpeg', 'wb')
l,data = inp.read()
wf.write(data)
wf.close()

This does not throw any errors, but I can't open the output file:

$ ffplay stream.mpeg 
ffplay version 4.2.7-0ubuntu0.1 Copyright (c) 2003-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  WARNING: library configuration mismatch
  avcodec     configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared --enable-version3 --disable-doc --disable-programs --enable-libaribb24 --enable-liblensfun --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libtesseract --enable-libvo_amrwbenc
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
stream.mpeg: Invalid data found when processing input    0B f=0/0

The documentation says:

Format: PCM_FORMAT_MPEG; Description: MPEG encoded audio?

I really don't know what's about that question mark on the end

Lima
  • 257
  • 1
  • 8

1 Answers1

1

ALSA enumerates all its formats and has a number reserved for SND_PCM_FORMAT_MPEG. Pyalsaaudio copied all enumerations droping the SND_ prefix. The PCM_FORMAT_MPEG format thus refers to the SND_PCM_FORMAT_MPEG in the ALSA library.

If we search the ALSA source code (https://github.com/search?q=org%3Aalsa-project+SND_PCM_FORMAT_MPEG&type=code) we find two hits. One is the definition and the other plays a role in the coupling to OSS (of which I know nothing), but there seems to be no role for it outside alsa-oss. I guess the person putting the question mark in the pyalsaaudio documentation observed thus that. The ALSA documentation and code are hard to read, so I speculate that rather than sorting this out completely this corner case was marked for requiring attention if you are interested.

That being said, it seems your intention is to write sound captured from a microphone to an mp3 file. There is no need to have the input device produce mpeg. The code below reads from the microphone and writes to an mp3 file. Using alsaaudio and pydub.

import alsaaudio
import numpy as np
import struct
import pydub 
import time

conversion_dicts = {
        alsaaudio.PCM_FORMAT_S16_LE: {'dtype': np.int16, 'endianness': '<', 'formatchar': 'h', 'bytewidth': 2},
}

def get_conversion_string(audioformat, noofsamples):
    conversion_dict = conversion_dicts[audioformat]
    conversion_string = f"{conversion_dict['endianness']}{noofsamples}{conversion_dict['formatchar']}"
    return conversion_string

device = 'default'
fs = 44100
periodsize=512

inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NONBLOCK, 
    channels=1, rate=fs, format=alsaaudio.PCM_FORMAT_S16_LE, 
    periodsize=periodsize, device=device)

print(inp.info())

with open("test.mp3", 'wb') as mp3file:
    
    
    dtype = np.int16 

    loops_with_data = int(np.ceil(5 * fs/periodsize))  # 
    first_time = True

    while loops_with_data > 0:
        # Read data from device
        l, rawdata = inp.read()

        conversion_string = get_conversion_string(alsaaudio.PCM_FORMAT_S16_LE, l)
        data = np.array(struct.unpack(conversion_string, rawdata), dtype=dtype)

        if l > 0:
            print(f"\r{loops_with_data:4} {l=}", end='')
            if first_time:
                # Create an empty song
                song = pydub.AudioSegment(b'', frame_rate=fs, sample_width=2, channels=1)
                
                # Clear the audio buffer
                inp.drop()
                first_time = False
            else:
                #smaller delay otherwise, still longer than one period length
                song += pydub.AudioSegment(data.tobytes(), frame_rate=fs, sample_width=2, channels=1)
            
            time.sleep(.1)
            loops_with_data-=1
        else:
            print(".", end='')
    
    song.export(mp3file, format="mp3", bitrate="320k")
  • Thank you, but I want to stream it on a socket, can you please help to get the output mp3 as chunks? – Lima Oct 28 '22 at 09:08
  • Alas, I am not an expert on streaming. I can see where the changes are needed: open("test.mp3", 'wb') should be replaced with opening a stream socket, song += pydub.AudioSegment(data.tobytes(), frame_rate=fs, sample_width=2, channels=1) should be replaced with a send or write call to your socket. You could replace the file object with a BytesIO object, and then send the whole object over a regular socket. But then you would create a file for each block. Anyway as I see it that is a new question not related to PCM_FORMAT_MPEG. – Ronald van Elburg Oct 28 '22 at 09:20
  • Yes, I just need to concatenate the song.export from that loop. I resolved. – Lima Oct 28 '22 at 10:15