Finally, I found moviepy https://pypi.python.org/pypi/moviepy which implements light wrapper for ffmpeg and provides interface to quickly obtain video and audio frames at the same time positions. You may find example below:
from moviepy.editor import *
video = VideoFileClip('your video filename')
audio = video.audio
duration = video.duration # == audio.duration, presented in seconds, float
#note video.fps != audio.fps
step = 0.1
for t in range(int(duration / step)): # runs through audio/video frames obtaining them by timestamp with step 100 msec
t = t * step
if t > audio.duration or t > video.duration: break
audio_frame = audio.get_frame(t) #numpy array representing mono/stereo values
video_frame = video.get_frame(t) #numpy array representing RGB/gray frame
Besides extracting a/v frames moviepy provides wide functionality spectrum for audio/video clips modification.