2

I'm trying to write a small program that react when the user is speaking. like have a circle get bigger or something like that.

im using this code to access the microphone, but how do I make it react only when the user is speaking? e.g. when the recorded volume is larger than some amount.

    TargetDataLine line = null;
    AudioFormat format = new AudioFormat(16000, 16, 1, true, true);
    DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);

    if(! AudioSystem.isLineSupported(info)){
        System.out.println("Line is not supported");
    }

    try{
        line = (TargetDataLine) AudioSystem.getLine(info);
        line.open();
    }catch(LineUnavailableException e){
        System.out.println("Failed to get line");
        System.exit(-1);
    }

    ByteArrayOutputStream out  = new ByteArrayOutputStream();
    int numBytesRead;
    byte[] data = new byte[line.getBufferSize() / 5];

    // Begin audio capture.
    line.start();

    int i = 0;

    // Here, stopped is a global boolean set by another thread.
    while (i<100) {
       // Read the next chunk of data from the TargetDataLine.
       numBytesRead =  line.read(data, 0, data.length);
       // Save this chunk of data.
       out.write(data, 0, numBytesRead);
       i++;
       System.out.println(i);
    }    
Andrew Thompson
  • 168,117
  • 40
  • 217
  • 433
Pita
  • 1,444
  • 1
  • 19
  • 29
  • possible duplicate of [Detect silence when recording](http://stackoverflow.com/questions/5800649/detect-silence-when-recording) – Andrew Thompson Jul 15 '13 at 03:10

1 Answers1

0

Within the last while loop, you are collecting sound data in a buffer variable called "data". What you need to do is to take those bytes and assemble them into usable DSP values. The code for doing so depends on the format. Most common is 16-bit encoding, stereo, little-endian. In this case you would have to assemble pairs of bytes into values, where the first byte is the lower bits and the second byte are the higher bits. There are several posts on this subject with the details of how to handle this.

The values will range from something like -32768 to 32767 (I am writing from memory and might be off, but it is the range of a short). It is hard to say where you will want your threshold to be, as the volume depends not only on the absolute value (larger is louder), but the amount of time spent at the larger values. It is possible for a "quiet" sound to have transients that are very large values. Also, the numbers don't correspond directly with decibels, and a conversion formula is needed.

So, there are a couple issues to deal with, but if you just get into the while loop and decode "data" you might be able to get something quick and dirty that works "well enough".

Phil Freihofner
  • 7,645
  • 1
  • 20
  • 41
  • so the byte stream read from the line is made up of 2byte values. and i have to find a threshold to see how loud is loud enough for me and set it to be the number? – Pita Jul 16 '13 at 06:43
  • Yes and no. (1) it's probably 2bytes for left and 2bytes for right channel if you have stereo. (2) you will probably want to do some sort of rolling average so that you aren't reacting to every transient that goes over the limit. – Phil Freihofner Jul 16 '13 at 20:28