Audio manipulation with Java

Question

The smallest unit of a digital image is a pixel. What is the smallest unit of digital sound? what can be considered to be a pixel for sound? How can we use java to manipulate it?

This is definitely too broad of a topic, sound is a continuous wave. You can manipulate it mathematically to [discretize](http://en.wikipedia.org/wiki/Discretization) it but then you're moving from the time to frequency spectrum. You can take whole college courses about this. — turbo, Jan 09 '14 at 14:15
I agree. For instance, I cannot find any pixels in my compressed avi files at all. — Martin James, Jan 09 '14 at 14:16
Very broadly speaking, smallest unit of an audio file is one "[hertz](http://en.wikipedia.org/wiki/Hertz)" (or sample) there are 44,100 "Hertz" per second in CD quality audio. You may want to refer to these [Java Sound](http://www.jsresources.org/examples/) API examples for more. — Elliott Frisch, Jan 09 '14 at 14:17
Does that mean it takes 44KB to store 1 second of music that is CD Quality? Wait, how many hertz or samples can the speaker produce? — user3177843, Jan 09 '14 at 14:23
@user3177843 No, a hertz is would be some form of a sine wave, you would need to sample it to digitize it and store it. The size in bits of one second of sound depends on the sampling rate, bit depth, the number of channels, and also whether or not you use a lossless or lossy codec to store the audio. Raw audio (CD quality) would be (assume 2 channels, 16 bit depth, 44.1k sample rate) 176.4 kB/s — turbo, Jan 09 '14 at 14:51
@turbo ok. understood. Is a channel like a speaker? What is a channel? — user3177843, Jan 09 '14 at 15:05
Call it a "frame," not a "hertz." Hertz is like mph or kph, it's a rate of speed. The smallest part of digital audio data is a frame, which is (I think) a measurement of the amplitude of a signal for 1/44100th of a second (given that particular sample rate of 44.1k). — Kevin Panko, Jan 09 '14 at 16:10
Hertz (Hz) is a unit of frequency; 1 Hz is "one per second". It's not another name for a sample. When someone says CD quality is 44100 Hertz, they mean there are 44100 samples per second per channel -- "sampled at 44110 Hz" would be the more precise way of saying that. — keshlam, Jan 09 '14 at 16:43
Note that the theoretical maximum frequency a digital recording can capture is half the sampling frequency. Beyond that, you get "aliasing" effects -- the same reason a wheel in a movie or video may seem to turn backward if it's going faster than the frame rate -- and it doesn't sound like you intended it to. So the 44.1kHz recording rate can record frequencies up to 22.05 kHz... which should cover the normal human hearing range quite happily. PC audio often uses lower rates and/or compression to reduce the amount of data needed while producing "good enough" sound. — keshlam, Jan 09 '14 at 16:50

Andrew Thompson · Accepted Answer · 2020-09-18T04:36:17.590

The smallest unit of sound is known as a frame. For 8 bit mono it will be a single byte. For stereo 16 bit it will be 4 bytes.

How can we use Java to manipulate it?

That depends on what you want to do with it. You will need to be a lot more specific to get reasonable answers.

Some possible operations are:

Volume change
Pan
Speed or slow the play rate, with or without..
Pitch shift
Spectrum analysis..

.. how many hertz or samples can the speaker produce?

That depends largely on the speaker. Speakers have all different types of dynamic ranges, usually in a kind of 'bell curve' with no absolute upper or lower limits.

Does that mean it takes 44KB to store 1 second of music that is CD Quality?

Each frame of CD quality sound contains 4 bytes, given it is stereo, 16 bit. Multiply 4 bytes by 44100 to calculate the number of bytes per second.

What's the difference between mono and stereo?

Mono has one channel, stereo has two.

What I want to do is manipulate individual units of sound and also - to create a custom musical instrument/synth.

It is not so hard to generate a simple sinusoidal sound in code. See Beeper for an example.

A lot of other effects can be created by playing around with the ADSR (Attack, Decay, Sustain, Release) envelope of a sound. For example, applying the ADSR envelope of a guitar note to a piano tone, will make it sound uncannily like a guitar, and vice versa.

What is channel? Is it like speaker - Left speaker is one channel and right speaker is another?

Pretty much. Mono sounds like rubbish (IMO), while stereo can make the different instruments sound like they are coming from different positions, just like if the band were sitting right in front of you.

5.1 channel sound is a little more complicated, and usually¹ it 'cheats' by simply.

Putting the left channel through the left speaker(s).
Putting the right channel through the right speaker(s).
Mixing them both equally and putting that through the center speaker.
Filtering for just the low frequency sound and putting that through the single woofer or bass speaker. The human ear cannot easily tell where low frequency sounds are coming from, so that is acceptable. The woofer can be placed anywhere in the room, and still sound just the same.

To be honest, I do not know of any sound format that actually stores 5 or 6 channels for the sound, I think it is all separated out (for the woofer) or mixed together (for the center speaker) in hardware at run-time. Java Sound will only deal with one or 2 channels directly, in any case.

Thanks. What's the difference between mono and stereo? What I want to do is manipulate individual units of sound and also - to create a custom musical instrument/synth. — user3177843, Jan 09 '14 at 14:45
Thanks again. What is channel? Is it like speaker - Left speaker is one channel and right speaker is another? And I will try to search about ADSR envelope. — user3177843, Jan 09 '14 at 15:01
@user3177843 Yes, stereo right would be one channel and stereo left the other — turbo, Jan 09 '14 at 15:06
@turbo I see. Thanks to all of you for your help. I am tired, so I will leave it here. — user3177843, Jan 09 '14 at 15:09
OK - see the latest, *latest* edit for you more recent comments.. When you get up. :) — Andrew Thompson, Jan 09 '14 at 15:12
@AndrewThompson I could be wrong, but I am pretty sure movies typically makes use of more than 2 channels, not too (at all?) common in music though, and not really relevant here. Not sure what audio format that would be, but something to be looked up easily. — turbo, Jan 09 '14 at 15:46
@turbo Intriguing.. perhaps I was incorrect on that aspect. As an aside. Your *"You can take whole college courses about this."* comment had me nodding in agreement. This is a very complex subject. — Andrew Thompson, Jan 09 '14 at 15:48

keshlam · Answer 2 · 2014-01-09T14:38:02.817

2

The smallest unit of digital sound is a sample -- the signal level at a particular point in time. [But see addendum below.]

To use Java to manipulate it: If you have to ask this question, you probably want to go looking for libraries someone else has written.

But if you want to know in general what's involved: Read in the sound file. If it was in a compressed format (such as MP3), unpack it. That will give you a very long array/vector of samples. You can cut-and-paste sections of that to edit the recording, or scale it to make it softer or louder (beware of "clipping", which results when you try to exceed the maximum volume). More complicated manipulations are possible, but that's a full course in digital signal processing which I'm not going to try to do here -- websearch that phrase, especially in conjunction with sound or audio or music should find more information.

You can also generate your own audio by producing the samples programmatically. A signal which varies sinusoidally from sample to sample produces a pure tone. Other repeating shapes add overtones of various kinds. Varying the frequency of the repetition changes the pitch. Adding several signals together (while watching out for clipping) mixes them into a single signal. And so on.

Note that MIDI is not "digital sound" -- it's a digital score. It describes what notes should be played when, but it's up to the synth to turn that into sound.

ADDENDUM: I haven't heard the term "frame" before (see Andrew's answer), but I'll believe it. I think of samples because I'm thinking at the hardware layer, but distinguishing that from sample meaning an audio clip is a Good Thing so I'd bet frame is indeed more correct/current.

edited Jan 09 '14 at 14:38

answered Jan 09 '14 at 14:30

keshlam

7,931
2
19
33

I picked up the term 'frame' mostly from dealing with [`Clip`](http://docs.oracle.com/javase/7/docs/api/javax/sound/sampled/Clip.html) which has at least 3 method names mentioning it. Perhaps that is the Java Sound team's very 'local' term for a sample. But I do understand what you mean by sample, they are the same thing. I'd have up-voted your answer (and down-voted the other one) if I had any votes remaining for the day. Your answer contains some good information. – Andrew Thompson Jan 09 '14 at 14:42
1

Thanks. The midi is like a note-wise composition, synth is like the instrument? If so, how can I create a custom synth? – user3177843 Jan 09 '14 at 14:42
Custom synth: Read and interpret the control info (eg a MIDI file) so you know what notes you're being asked to start and stop when and what modifiers to apply to them. Use that information to "draw" the appropriate waveform at the appropriate time in the sound array. You may need to mix (sum) it with other notes that overlap it. Play the resulting audio. Note that doing this fast enough to keep up with realtime playback is Not Easy; I'm describing it as a batch process where you're building the audio first and playing it later to keep things simple. – keshlam Jan 09 '14 at 14:47
1

Note that Java is **not** the language I'd choose when writing a realtime soft-synth. Among other things, garbage collect time is likely to mess up the output. – keshlam Jan 09 '14 at 14:48
Oh, and read carefully the advice in this answer in regard to MIDI. I'll leave that to @keshlam, since they seem to know a lot about it. I'm more experienced at dealing with sampled sound. – Andrew Thompson Jan 09 '14 at 14:55
@keshlam Java Sound also has support for MIDI based instrument or [sound banks](http://docs.oracle.com/javase/7/docs/api/javax/sound/midi/Soundbank.html), which is a good thing, given the default sound bank is so.. (how can I say this subtly, without swear words?) sub-standard. ;) – Andrew Thompson Jan 09 '14 at 14:59
Sample is to audio as pixel is to images. With PCM a frame refers to a single sample for each channel but since some formats don't have samples (lossy/transform) 'frame' is more generalized. – Radiodef Jan 09 '14 at 15:02
Thanks for all your expertise. Both of you. – user3177843 Jan 09 '14 at 15:03
@Radiodef Thanks. But what is a channel? 1 channel goes to one speaker? – user3177843 Jan 09 '14 at 15:07
@user3177843 That's basically it. You have a left channel that goes to the left speaker, a right channel that goes to the right speaker and when it sounds like something is in the middle, it's because it's the same sound playing in both channels. (The middle is called the "phantom center".) – Radiodef Jan 09 '14 at 15:09
@Radiodef *"With PCM a frame refers to a single sample for each channel but since some formats don't have samples (lossy/transform) 'frame' is more generalized."* That actually has a ring of truth to it. I am used to dealing with sampled sound at a higher level as provided by Java Sound, which is why I tend to refer to them as frames.. Thanks for clarifying the difference. E.G. with stereo, a single frame would consist of two samples. – Andrew Thompson Jan 09 '14 at 15:39
I'm learning too here. Thanks, everyone. I understand principles, and I'm a half-trained _user_ of some fairly high-end sound software, but I haven't had occasion to work with it at the byte level yet and certainly not in Java. – keshlam Jan 09 '14 at 16:39

score 0 · Answer 3 · answered Jan 09 '14 at 14:19

0

In java you´d typically work with AudioInputStream instances (that you get out of classes defined by the Java sound API). Those are read byte-wise for playback. I have never doen manipulation myself, but as far as I know, this is mostly done through Java sound´s mixer class.

Below tutorial should have all the info you´re looking for: http://docs.oracle.com/javase/tutorial/sound/playing.html

answered Jan 09 '14 at 14:19

Michael Langowski

348
2
10

Does that allow me to manipulate individual hertzs or samples? – user3177843 Jan 09 '14 at 14:29
*"Those are read byte-wise for playback."* That is so inaccurate as to be ..wrong. They are dealt with frame by frame. – Andrew Thompson Jan 09 '14 at 14:29

Audio manipulation with Java

3 Answers3