1

TargetDataLine is, for me so far, the easiest way to capture microphone input in Java. I want to encode the audio that I capture with a video of the screen [in a screen recorder software] so that the user can create a tutorial, slide case etc.
I use Xuggler to encode the video.
They do have a tutorial on encoding audio with video but they take their audio from a file. In my case, the audio is live.



To encode the video I use com.xuggle.mediaTool.IMediaWriter. The IMediaWriter object allows me to add a video stream and has an
encodeAudio(int streamIndex, short[] samples, long timeStamp, TimeUnit timeUnit)
I can use that if I can get the samples from target data line as short[]. It returns byte[]
So two questions are:

How can I encode the live audio with video?

How do I maintain the proper timing of the audio packets so that they are encoded at the proper time?

References:
1. DavaDoc for TargetDataLine: http://docs.oracle.com/javase/1.4.2/docs/api/javax/sound/sampled/TargetDataLine.html
2. Xuggler Documentation: http://build.xuggle.com/view/Stable/job/xuggler_jdk5_stable/javadoc/java/api/index.html



Update

My code for capturing video

public void run(){
        final IRational FRAME_RATE = IRational.make(frameRate, 1);
        final IMediaWriter writer = ToolFactory.makeWriter(completeFileName);
        writer.addVideoStream(0, 0,FRAME_RATE, recordingArea.width, recordingArea.height);
        long startTime = System.nanoTime();

        while(keepCapturing==true){
            image = bot.createScreenCapture(recordingArea);
            PointerInfo pointerInfo = MouseInfo.getPointerInfo();
            Point globalPosition = pointerInfo.getLocation();

            int relativeX = globalPosition.x - recordingArea.x;
            int relativeY = globalPosition.y - recordingArea.y;

            BufferedImage bgr = convertToType(image,BufferedImage.TYPE_3BYTE_BGR);
            if(cursor!=null){
                bgr.getGraphics().drawImage(((ImageIcon)cursor).getImage(), relativeX,relativeY,null);
            }
            try{
                writer.encodeVideo(0,bgr,System.nanoTime()-startTime,TimeUnit.NANOSECONDS);
            }catch(Exception e){
                writer.close();
                JOptionPane.showMessageDialog(null, 
                        "Recording will stop abruptly because" +
                        "an error has occured", "Error",JOptionPane.ERROR_MESSAGE,null); 
            }

            try{
                sleep(sleepTime);
            }catch(InterruptedException e){
                e.printStackTrace();
            }
        }
        writer.close();

    }
An SO User
  • 24,612
  • 35
  • 133
  • 221

1 Answers1

2

I answered most of that recently under this question: Xuggler encoding and muxing

Code sample:

writer.addVideoStream(videoStreamIndex, 0, videoCodec, width, height);
writer.addAudioStream(audioStreamIndex, 0, audioCodec, channelCount, sampleRate);

while (... have more data ...)
{
    BufferedImage videoFrame = ...;
    long videoFrameTime = ...; // this is the time to display this frame
    writer.encodeVideo(videoStreamIndex, videoFrame, videoFrameTime, DEFAULT_TIME_UNIT);

    short[] audioSamples = ...; // the size of this array should be number of samples * channelCount
    long audioSamplesTime = ...; // this is the time to play back this bit of audio
    writer.encodeAudio(audioStreamIndex, audioSamples, audioSamplesTime, DEFAULT_TIME_UNIT);
}

In the case of TargetDataLine, getMicrosecondPosition() will tell you the time you need for audioSamplesTime. This appears to start from the time the TargetDataLine was opened. You need to figure out how to get a video timestamp referenced to the same clock, which depends on the video device and/or how you capture video. The absolute values do not matter as long as they are both using the same clock. You could subtract the initial value (at start of stream) from both your video and your audio times so that the timestamps match, but that is only a somewhat approximate match (probably close enough in practice).

You need to call encodeVideo and encodeAudio in strictly increasing order of time; you may have to buffer some audio and some video to make sure you can do that. More details here.

Community
  • 1
  • 1
Alex I
  • 19,689
  • 9
  • 86
  • 158
  • I have edited the question so you can see the code. The frame rate, in my case, can vary between 10 and 25 so how do I dynamically maintain the proper timing? and PS: is my code doing stuff correctly? like encoding at 10fps etc? – An SO User Dec 25 '12 at 16:38
  • 1
    @LittleChild: you can take the video timestamp to be the time at which you get a frame from `createScreenCapture`. "Dynamically maintain proper timing" - just subtract the first value (at the start of stream) from both audio and video timestamp, and then feed audio/video to the media writer in order of increasing timestamps. – Alex I Dec 25 '12 at 17:44
  • 1
    @LittleChild: I can see a couple of problems, you *do* specify a framerate for a variable-framerate stream, but *don't* specify a codec. The other way around would be better :) Try calling `addVideoStream(int inputIndex, int streamId, ICodec.ID codecId, int width, int height);` instead. – Alex I Dec 25 '12 at 17:46
  • Hey, I got my head around `TargetDataLine` but here is a new issue. Please address that as you addressed this. Here: http://stackoverflow.com/questions/14033778/bigendian-littleendian-confusion – An SO User Dec 25 '12 at 20:30
  • P.S. You can use any codecs you want, but I'd recommend `ICodec.ID.CODEC_ID_H264` for video (or `CODEC_ID_MPEG2VIDEO` during testing) and `ICodec.ID.CODEC_ID_AAC` (or `CODEC_ID_MP3`) for audio. Make sure your output container can mux those too: .mp4 or .mkv or .ts should be fine. – Alex I Dec 26 '12 at 00:00
  • Hey please check out the question in the link. – An SO User Dec 26 '12 at 06:57