Garnish Music Production School | Los Angeles

What is Sound?

Part 1: What is Sound?

You’ve no doubt played around with WAV or AIFF files on your computer but these are pre-recorded sounds that have already converted into a digital format, all nicely prepared for you to use in your tracks. In order to understand how this all works we’ll first need to look at how Sound is created and recorded.

When someone sings they produce vibrations in the air, due to the expansion and rarefaction of air molecules (changes in air pressure) as the sound radiates out in all directions from the source (i.e. the singer). The same process applies to acoustic instruments in general, the difference being in how sound is generated by plucking a string (guitar), blowing a reed (saxophone), or hitting a skin (conga), for example.

A microphone can be used to ‘capture’ this radiated sound and to turn it into electrical energy in a process known as transducing (converting one form of energy into another).

(Figure 1 The transduction process)


We principally hear sound using our ears, but we can also feel it through our bodies. As discussed, sound is essentially just variations in air pressure, and our ears can detect minute variations from the annoying whine of a mosquito through to vast volumes of air being moved by massive sub woofers!

We can describe sound using 4 attributes:

  • Frequency
  • Loudness
  • Frequency Content (Timbre)
  • Duration

In the next section we will look at how this can represented visually.

Part 2: Visualising Sound

In a DAW such as Logic, the sound that is being heard via the speakers can be represented visually as a waveform. This waveform represents information across two axis; in figure 2 below the y (vertical) axis represents voltage and the x (horizontal) axis represents time.

(Figure 2. A simple electrical signal represented as a waveform)


From an audio point of view it is more normal to represent the y axis (voltage) as amplitude or volume, as you will see later there is a direct relationship between voltage and amplitude.

(Figure 3. A waveform representation of a snare drum in Logic Pro X)


A Sound can be described by its frequency and this is measured in Hertz (Hz) – cycles per second. A cycle is actually a complete motion of the sound wave through 360 degrees as it flows from the zero crossing point (that’s the horizontal line, exactly in the middle of the waveform vertically – see above) up to its positive value (90°), down through the zero crossing (180°) to its negative value (270°) and then back up again to zero (360°).

This is known as the phase of the signal and the combination of phases of different signals can have a massive impact on the overall sound.

A good way to think about phase is in the movement of a loudspeaker, with the zero crossing point (see figure 2) represented by the speaker at rest.

(Figure 5: The outward speaker movement)


Inside the speaker is a void with two magnets; one that is fixed at the back and one that is floating nearby in the speaker cone. Do you know that two magnets can have different poles? and that they can attract or repel each other? This is the principal used in the speaker to create physical movement in the cone. The fixed magnet (a) at the back has its polarity reversed because of the AC (Alternating Current) that is coming from the Amplifier. When it is positive it repels the other magnet (b), forcing the cone outwards and pushing air out. when the polarity of the fixed magnet (a) shifts, the floating magnet (b) is attracted, pulling the cone back in. This process repeats again and again, many times per second to create a column of air and the sound that we hear.

(Figure 6: The inward speaker movement)


Just like a real instrument creates sound through vibrations, the movement of the cone inwards and outwards creates a movement of air that our ears translate into sound.

As discussed earlier, the height of the waveform indicates the amplitude of the signal and this can be measured in different ways – Figure 7 shows some of them:

(Figure 7: Graph of a waveform showing different ways of measuring amplitude)


Peak: The measurement between the quietest point (zero crossing) and the loudest point of the waveform

Peak-to-Peak: The measurement between the top of the positive and bottom of the negative signal levels of the waveform

RMS (Root Mean Square): This is a method used to give a more accurate model of how our human ears hear things and is an average level. You’ll also find this as a mode that you can select when working with a compressor – typically the mode used for vocals and less rhythmic sounds.


Part 3: Digitising Sound

Once we have the sound in an electrical form the next stage is to convert it into something that the computer can actually work with- binary. The device that makes this conversion is the audio interface and more specifically it is the Analogue-to-Digital converter (ADC) within the interface that actually does the work.

The ADC slices the incoming audio into tiny, equally spaced chunks (known as samples) and take a voltage reading at each of these points. As you’ll see from the next diagram, the size of these slices and also the amount of detail we have to represent the peak-to-peak voltage is important. Otherwise we’ll end up with a rough sounding approximation of the original sound.

(Figure 8: A digitally ‘sampled’ version of the analogue waveform – notice the jagged steps)


The amount of times the audio is sampled per second is known as the Sample Rate. So why is the sample rate of CD’s 44.1 kHz? It’s all about the Nyquist Theory – a simple sum used to get your sample rate correct.

It works like this; if you want to sample something accurately you need to set your sample rate to twice as high as the highest frequency to be recorded. Now we’ve already established that human beings can hear up to 20 kHz so we’ll need to double that – 40 kHz. ‘Wait a minute; CDs are 44.1 kHz – where’s that extra 4.1 kHz come from?’ I hear you ask… Well there’s a reason for that. If any frequencies greater than one half the Sample Rate are allowed to go through the conversion process – say an extremely high-pitched tone at 40 kHz, unwanted nasty frequencies known as Alias Frequencies could come through and be heard as distortion.

To get rid of these a Low Pass Filter is placed before the whole A/D conversion – this removes all frequencies above its cutoff value. Now real world filters always have a slope – they never cut straight at the cutoff value – something you’ll learn about later in the course when we look at filters in more detail. It’s this slope that causes the extra data necessary.

(Figure 9: Low Pass Filter applied to remove the risk of unwanted distortion)

So what about the Bit Depth? Well this is basically the amount of steps that are available to represent the volume taken by the A/D converter at each sample period. Have a look at the data table below and you’ll see just how much detail that gives you.


Now the Human Ear actually is very good at noticing the subtle differences between loud and soft in the real world so a low bit depth is not going to be accurate at all. If you are making music where delicate volume changes might occur, say acoustic singer songwriter material, smooth Jazz or even Classical it makes sense to increase it to the most your soundcard can handle, much more so than increasing the Sample Rate.

Part 4: Frequency and Pitch

As shown previously, frequency is a measurement of the number of times something oscillates every second and it is measured in Hertz (Hz). The whole purpose of recording sound is to capture what occurs within the range of human hearing.

The maximum range that human beings can hear is 20Hz right up to 20KHz but age and other factors (DJ’s I’m looking at you!) reduce this range over time.

Here’s a rough guide to the frequency ranges of certain sounds:

Behind every musical pitch is a frequency

There is a direct relationship between frequency as described so far and the more musical term pitch.

Each pitch in a musical scale has a fixed frequency (see below) associated with it and there is an even greater significance associated to pitches that are octaves (C1, C2, C3 for example).

Now this is an interesting one, the entire musical scale has a frequency associated to each note. The musical note C1 is actually a frequency of 32.7032Hz and the musical note C2 is a frequency of 65.4064Hz – notice anything special about the relationship between those two numbers? Those of you who are good with numbers will notice that the number is actually doubled as you go up the Octave (12 semitones on the keyboard)

This is the interesting relationship between Octaves and Frequencies – if you want to go up in pitch by an Octave double the frequency and if you want to go down an Octave you halve it – mad eh? This relationship can be quite handy to think about when you are working with EQ, as often a tweak at one frequency will benefit from another at an Octave below or above.

Here’s a chart of some note pitches and their associated Frequencies, also their MIDI number on the keyboard to give you an idea:

(! Calculated with the Online Pitch to Frequency converter:

Traditional musical instruments have a fixed range of pitches that they cover, some wider than others and it’s worth knowing the subtle differences between each if you are going to create music that actually sounds realistic. These days if you buy a classical library on a sampler you will find that if it has been programmed well you won’t be able to play notes outside of this range, a handy thing for those of us who are non-musicians. The bulk of these notes occur in the first few thousand Hertz leaving a whole lot of additional Hertz above for non musical character. You’ll learn what these upper frequency areas are for as you develop your ears and your understanding of what they are hearing.

Interestingly pitch is not the whole story, whilst it may tell you what note you are hearing we need more information before we can decide whether that note is a piano or a guitar, for example. This additional information is known as the timbre of the sound and to explain this we need to discuss what makes up the tonal quality of a sound.


Part 5: Timbre

The timbre of a sound is a way of describing its character; that is what makes a piano, synth or guitar sound different from each other.

Musical sounds are made up from the interaction of different frequencies known as harmonics over time. The harmonic that defines the pitch of a sound is known as the fundamental or 1st harmonic, and from there, the frequency of the harmonics increases up the frequency range as defined by the harmonic series . Figure 10 shows the harmonics present in a guitar patch taken from the Logic Pro X library; note the distinct peaks of each of the harmonics present in the sound.

(Figure 10. Spectrum analysis of a the frequency content of a guitar)


The harmonic series shows that the harmonics in a musical sound are mathematically related in a very specific way. Each harmonic is a whole number multiple of the fundamental frequency (see figure 11) and these harmonics are also known as overtones.

These whole number multiples are the overtones that are nice, the ones that actually sound musical. If you take a look at the numbers it makes sense too, given that we’ve established that doubling the frequency takes you up an octave. If you were to take a look at things on a bigger scale you’d see that the second, fourth, eighth, sixteenth and thirty second harmonic overtones are the octave jumps. Interesting too that the third, sixth, twelfth and twenty-fourth harmonic overtones are perfect fifths.

Major thirds are created with the fifth, tenth and twentieth harmonic overtones, mad eh? It’s amazing how much of a link between numbers and sound exists.

A stringed instrument such as the guitar is a perfect example of overtones that occur naturally and deliberately because of the physical construction of the instrument. When a string is plucked it will vibrate back and forth many hundreds of times before losing energy, fading down in volume as it does.

The movement of the string is extremely complex, it’s not just a pure back and forth action across a single plane. This creates it’s own series of additional vibrations, each contributing it’s own harmonic overtone to a certain degree.

(Figure 13)


Figure 13 shows how these overtones related to each other when a string is plucked. The discovery of this is generally accredited to Pythagoras, who developed this idea of frequency ratios over 2500 years ago!

If the harmonic overtones are the ‘musical’ ones then what about frequencies that are not whole number multiples? Any frequency that falls outside of this is known as inharmonic and are generally found in any sound that are atonal, that is not considered musical (drums, percussion, bells, breaking glass, explosions- the list goes on!)

To help you understand the timbre a little more it’s worth us understanding how it can change over time on real acoustic instruments.

Every natural sound changes timbre, sometimes drastically, across its duration. Some overtones might be heard at the beginning of the sound and increase or decrease in strength whilst you listen. As the sound progresses other overtones may grow and appear, even on one note of an instrument. When more than one note is heard things can get really complicated with interharmonic relationships coming into play. These are very difficult to simulate on synthesisers so it’s fair to say that you can never re-create a real acoustic instrument with 100% accuracy on a synth.


Part 6: Loudness

The technical word that is used to describe loudness or volume is amplitude. This term is used to describe the height of a waveform above and below the center line (silence). Amplitude will often change during the duration of the sound.

Let’s take a look at a sound like a snare drum. The drummer hits down on the snare with a drumstick and there is a fast initial burst of volume which rapidly dies away.

A saxophone takes a longer time to get to its maximum volume once the player has blown into the mouthpiece. Once it’s up there it stays that way until the person stops blowing (with fluctuations of course according to how strongly they can play the sustained note) and then the level quickly dies away.

An electric piano has a different characteristic. Once the player pushes the keys down the volume quickly rises to the maximum and then falls down over a relatively long period known as the decay.

In sound design we can shape the amplitude of a sound by using a volume/amplifier envelope. There is a selection of parameters available that allow us to adjust the volume of the sound over time. Experienced sound designers will be able to see the waveforms above and have an idea of what those settings would need to be in order to copy their volume ‘shape’.


The Decibel

The loudness of sound is measured in dB (Decibels), and when boosting or attenuating (reducing) a signal using software or sound equipment, gain is commonly measured in dB

Different types of decibels are used to describe power, intensity or pressure in a variety of areas such as acoustics, electronics and other sciences. This is because the decibel is a unit of a scale, which is relative to a particular reference level. The reference used is indicated through the use of a suffix after the dB abbreviation. The most common type of decibel used in acoustics is dB SPL, but other fields have different references and therefore use different suffixes. For instance, in electronics that reference can be a measurement in volts or watts (dBu or dBm).

Decibel is a logarithmic unit of scale, which is a mathematical function that is used to reduce massive numeric values into smaller, more manageable numbers. A logarithmic scale, such as the decibel scale, is essentially a simplified map used to represent the chosen reference values (whether they be sound pressure, volts, watts, etc’).

Exponential Values

If we hear a sound and ‘feel’ that it’s twice as loud, actually it’s more likely to be ten times as loud as before. If you were to hear a sound that has an amplitude of 10dB and then another at 20dB the second sound will actually be a hundred times the amplitude, not twice the amplitude.


The difference in amplitude across a sound is known as its dynamic range. Take for example a complete movie on bluray; this soundtrack will have moments of quiet often followed by extreme high volume (especially in horror movies) A movie soundtrack is therefore known to have a high dynamic range.


Some acoustic instruments will have a larger dynamic range than others. A snare drum will have a bigger dynamic range than a clarinet and an acoustic piano more than an electric piano such as a Rhodes. If you want to synthesise an instrument that emulates the sound of an acoustic one, you will want to mimic the volume characteristic that is associated with it.

What’s really important for us to understand is that, because each decibel refers to a value which is increasing exponentially up the scale, moving from 0dB to 1dB will represent a much smaller change in sound pressure than moving from 20dB to 21dB, even though the decibel value has increased by the same amount!

To help give you some context here’s some tips:

  • a 1dB change in level is hardly noticeable when a sound is isolated.
  • Doubling of the Volume (perceived loudness) is roughly equal to a 10dB increase and half the volume is equal to a 10dB reduction.
  • Doubling the Sound Pressure (Voltage) corresponds to a measured level change of 6dB.
  • Doubling of Acoustic Power (Sound intensity) corresponds to a level change of 3dB.

    Volume change = pitch change

    In real world instruments the volume can change according to the pitch being played. Usually higher pitches bring less dynamic range than lower pitches. This is often because of the physical construction of the instrument; violins have longer or thicker strings for the lower pitches. These will vibrate for longer periods of time and with stronger vibrations than shorter, thinner strings.

    ‘Perceived Loudness’

    Because of the way our ears work, some sounds will appear louder to us than others. For example, sounds with plenty of energy around 4,000 cycles per second (4kHz) are the ones that we hear best. These mid-range frequencies will always feel louder even though technically they may be the same loudness as other frequencies. For more information about this check out the Wikipedia entry for Equal-loudness contours


Part 6: Sonic Illusions

Let’s now take a look at some interesting situations where sounds combine to create unique effects or sonic ‘illusions’


If you are into trance or another style where a big thick lead sound is popular you’ll have heard the end result of beating. This is where two sounds are slightly different in frequency and playing at the same time. What happens is that the sound moves, in a way that feels like repetitive volume adjustments. The two separate movements slow down as the two notes approach the same pitch and stop when the pitches match. This creates a third frequency in our ears, thickening the tone further.

Combination Tones

If two sounds are different by more than 50Hz they create combination tones. These are additional ‘ghost’ pitches that our ears perceive and can be mathematically calculated by both the sum and difference between the two tones. The discovery of some of these phenomena is credited to the violinist Giuseppe Tartini so they are sometimes called Tartini tones.

(Sum and Difference calculation)

These concepts come under the general subject matter of psychoacoustics. If you want to learn more about this you can start at the Wikipedia page and go on a voyage of discovery

Bring The Noise

Noise is an incredibly useful and versatile tool for sound design. It can be heard all over the place in synthesised sounds, from fx through to drum hits.

(White Noise)

White noise has energy across the all spectrum while brown and pink noise have more frequencies at the bottom end and the high end respectively. Filtered White noise is often use to create sound effects on sci-fi movies to simulate the sound of a spaceship engine.

You will not be able to specify a frequency upon generating noise, as the frequencies are random. This means that it doesn’t matter which key you push on a synthesizer upon generating a white noise, the sound will always be the same.


7. Synthesised Waveforms

There are a vast array of synthesisers that can create a diverse range of sounds using different waveforms and the choice can be bewildering. A good place to start is with the basic waveforms that makeup the building blocks of analogue or subtractive synthesis.

These waveforms are essentially the sound of electricity being played through speakers and it is essential to know the sound of each of these by heart.

The Sine Wave is the simplest form of waveform we can create. It’s the fundamental and the fundamental alone, nothing above or below the frequency it is created at. It has no overtones.

(The Sine Wave)


The Sawtooth contains both odd and even harmonics. It’s made up of the fundamental, half as much of the second harmonic, a third as much of the third harmonic and so on. The result is a bright and buzzy waveform.

(The Sawtooth Waveform)


A Pulse Wave can be adjusted to create a variety of Timbres. If it is set so that it looks like the waveform below it is known as a Square Wave. In Analog this is achieved by setting the Pulse Width to 100%. The Square Wave contains only the odd numbered harmonics in decreasing amounts. So that’s the third, the fifth, the seventh, ninth, eleventh etc. This brings a ‘hollow’ texture to the sound (but it can still sound great for bass).

(Pulse Wave set at 100% to create a Square Wave)


(Pulse Wave)


As you’ve already seen, White Noise is random frequency content anywhere from 20Hz to 20kHz at an equal amplitude and is a versatile choice for sound design. It can be used to recreate wind, build special effects and also drum sounds. It has an abundance of low and high frequency energy and is often partnered with filters for tone shaping.


8. Volume Envelope

So we’ve looked at a selection of waveforms that offer a broad range of possible tones to use as building blocks in sound design. Let’s now explore the additional possibilities on offer when we add the ability to control the volume of our sounds over the duration of the note using an amplifier/volume envelope.

As you know, the volume of the sound across its duration is important in defining the sound. In synthesisers and samplers we have the ability to gain control using an envelope.

To understand how an envelope works, we need to go back to how we perceive the volume of sound. For example the sound of a piano has a strong emphasis at the start of the sound but then doesn’t sustain for very long, whereas the sound of a car engine passing by will gradually sound louder and louder as the car approaches and then slowly disappear in volume.

The keyboard triggers envelope generators; when you press a key, it sends a ‘trigger’ message to the envelope generator; it begins the generator’s process of creating an envelope. The two messages “note on” and “note off” are the triggers the envelope uses to shape the sound overtime. And if you release the key at any stage during the development of the envelope, it will execute its release stage immediately, skipping whichever stage it hadn’t yet executed.

The envelope generator has four parameters: Attack time, Decay time, Sustain level, and Release time, usually called ADSR envelope.

Figure 15 below shows how an ADSR envelope is used to control the volume of the sound, from when you hit a key until you release it.

(Figure 15 An ADSR Envelope)


Attack: This is the length of time required for the sound to reach its initial maximum volume after the key is pressed. Obviously it will be very short for a percussive sound.

Decay: This is what happens immediately after a sound hits its maximum volume level in the attack phase. It’s the time taken for the volume to reach a second volume level known as the sustain level.

Sustain: This is not a length of time but the volume level at which the sound sustains after the decay phase. In most sounds is it lower than the attack volume, but it could be the same. Usually, it’s the volume at which a sound plays while a key is being held down. This phase can, theoretically, last forever, or at least until you get tired of holding down the key.

Release: This is the final phase, again measured in time, and is the time it takes the volume to reach zero after you release the key.

Of course, these parameters won’t allow us to recreate a real world sound identically but they will provide us with a decent enough range of flexibility to create some usable instruments. Although it’s possible to get some close approximations, Analogue or FM synthesisers are not designed to perfectly mimic acoustic instruments or real world sounds. They excel at usable synthetic instruments for music composition as well as futuristic sound effects to spice up a track.

The envelope real world sounds, both acoustic and electronic shape of a real sound is not as ‘precise’ as the envelope on a synthesiser so you can’t ever expect a perfect copy of the volume behaviour of the sound.

So this week, we’ve looked at the building blocks of sound and explored ways to describe and understand it.

These are the fundamentals of sound design and later on in the course you’ll no doubt come back to this lesson and see all the pieces of the puzzle slotting neatly into place.

No Comments
Post a Comment