Digital FM: two words that strike fear in the hearts of many synthesists.
Too complicated, some say. Too cold-sounding. And anyway, it’s only good for things like bells and electric pianos.
Well, let’s face facts. It is a little complicated — certainly more so than standard analog subtractive synthesis or additive techniques — but by no means impossible to understand. (And no, you don’t need a degree in mathematics to use it, either!)
On the other two counts: Wrong, and wrong again. Sounds created with digital FM don’t have to be cold or edgy — in fact, there are lots of tools that let you add warmth and movement. And FM sounds are by no means limited to bells and electric piano (though admittedly it’s a technique that does both well) — you can actually craft a broad palette of tonalities such as brass, wood tones, even a decent approximation of the human voice.
In this series of articles, we’ll cover all the basics and tell you what you need to know in order to program FM sounds. While we’ll be referencing two specific Yamaha synthesizers that have digital FM capabilities (MONTAGE and MODX), the information will be applicable to other FM synths too — just check your instrument’s owners manual for the equivalent button-pushes.
Now let’s get started … right at the very beginning!
At first glance, this may seem too basic, but the question of what makes up a sound is actually a lot deeper than it may appear. At its essence, a sound is anything we hear as a result of vibrations in the air. These vibrations in turn, cause our eardrums to vibrate in a similar fashion. The back-and-forth movements of the eardrum are converted by tiny bones in our inner ear into electrical signals that travel up nerves into our brain, where they are finally perceived by us as a sound.
Obviously, there’s an enormous gamut of sounds in existence — everything from a violin to a jackhammer, the wings of a butterfly softly beating to a crack of thunder. What is it that differentiates sounds from one another?
Actually, there are three factors:
Interestingly, all sounds can be described in terms of these three aspects, and these three only. What’s more, all sounds exhibit all three aspects. Think about it: Can there be a sound that has no loudness? (If so, it’s not a sound.) Can there be a sound that has no particular quality? Not possible. And while there can be (and are) plenty of sounds that have no pitch, they can then be described in terms of their lack of pitch.
Let’s take a look at each of these factors in turn.
The amplitude (loudness) of a sound is easily measurable in a number of different ways. There are, for example, hardware devices and software apps called sound pressure level (SPL) meters that measure amplitude in real time. Or you can record a sound into your favorite DAW (Digital Audio Workstation) and see the resulting waveform on your screen afterwards. Such a display might look something like this:
Here’s the same sound a little softer:
… and a little louder:
Obviously, every sound always changes amplitude during its existence (eventually dropping to and remaining at 0 dB) since there is no such thing as a perpetual sound that lasts forever. We’ll be talking a lot more about amplitude in future installments, but for the purposes of this article, let’s stop here and move on to …
Just as you can determine the amplitude of a sound by the height of its wave, so too can you determine its pitch by the number of waves that occur in a given period of time — in other words, how frequently those waves occur (hence the name “frequency”). The lower the pitch, the fewer the number of waves in any given time period; the higher the pitch, the more waves occur.
The unit of measurement for frequency is the Hertz, or Hz for short. (There’s also a unit called the kiloHertz, or kHz, that represents a thousand Hertz.) This describes the number of waves that occur in one second of time, so a sound with a frequency of 1 Hz generates one wave (one backwards and forwards movement of air) per second; a sound with a frequency of 100 Hz generates a hundred waves per second; a sound with a frequency of 1 kHz generates a thousand waves per second, and so on.
We humans can only perceive pitches between roughly 20 Hz and 20 kHz (that is 20,000 Hz), though unfortunately the high end of this range gets reduced somewhat as we get older (and the rate of deterioration is accelerated when you’re exposed to lots of loud sounds, so turn those speakers down!). This is known as the audible range. Sounds do exist both above and below this 20 Hz – 20 kHz range; those below 20 Hz are termed subsonic (they’re the rumbling “feel” frequencies that make your chest pound and the dance floor vibrate), and sounds above 20 kHz are termed supersonic (they’re the invisible sounds that make your dog’s ears perk up!).
Frequency has a lot to do with whether or not we perceive a sound as being musical. For the most part, if you can determine a clear pitch (as, for example, in a violin note), the sound is considered musical. If you can’t (as in a jackhammer), it’s non-musical. But there’s a big gray area in-between. Can you clearly make out the pitch of a bass drum or a snare drum? Most people can’t, but drummers spend a lot of time tuning their drums, so maybe they can perceive something that the rest of us are missing. The same goes for other percussive instruments, like cymbals, shakers and tambourines. On the flip side of a coin, some individuals are able to perceive pitches in what most of us would consider non-musical sounds like a breeze blowing or the hum of an engine. The bottom line is that it’s somewhat subjective, though most of us can agree that most (but not necessarily all) musical instruments have a clear pitch component.
Every time a musical sound goes up an octave, its frequency doubles. For example, here’s the waveform of the A above middle C played on a piano:
The Western musical system divides each octave into 12 equal components (more or less) called semitones, and the distance between any two notes is called an interval. The interval known as the fifth (actually seven semitones higher than the starting, or root, note) is the frequency midpoint. In other words, the fifth above the A note (the E) has 1 1/2 times its frequency, which is why it’s so pleasing to our ears. In standard tuning, the frequency of A is 440 Hz, so the E above it has a frequency of roughly 660 Hz (it’s “roughly” and not “exactly” because pianos use an equal temperament tuning system — a subject that’s well beyond the scope of this article, but Google it if you’re curious). These kinds of mathematical relationships will have a great deal of relevance when creating FM sounds, so it’s important to have a good grasp of them.
Now let’s move on to …
At first glance, timbre may seem to be the toughest concept to grasp. Sure, we can call a sound “soft” or “warm” or “harsh” or “bright,” but those are all such indistinct, subjective terms, no?
Well, it may surprise you to learn that timbre, like every aspect of music, can actually be described in pure mathematical terms — one reason why computers are so good at synthesizing sounds. To understand this, let’s return to our example of an A440 piano note. When you depress that key, what happens is that, via a series of mechanisms, a hammer strikes a string (actually two or more strings, but that’s not especially relevant to this discussion just yet), which begins vibrating at a rate of — you guessed it — 440 times per second.
This is not specific to piano (which we used here just for the sake of illustration), or to any particular musical instrument, for that matter; it’s actually a feature of all sounds in existence. And it’s the number and type of overtones, and their relative strength to one another, that determines a sound’s timbre. Musical sounds (piano, flute, birdsong, etc.) will tend to have more harmonic than inharmonic overtones, while non-musical ones (cymbal, tambourine, jackhammer) will have the reverse. But all naturally occurring sounds consist of a particular blend of overtones, and that’s why we can tell the difference between a piano and a flute and a cymbal and a tambourine, even if they’re all played at the same loudness level, and even if we’re in a room with all the lights out.
One of the inexplicable wonders of nature is that the timbre of a sound is reflected in its waveform in an almost poetic manner. The smooth sound of a flute, for example, is displayed as a gentle, rounded waveshape that looks like this:
… while the brighter tone of a twanged guitar string has a waveform that’s more jagged:
In contrast, the sizzle of a cymbal is thoroughly spiky and irregular:
In general, the gentler and smoother the sound (the result of fewer overtones), the more rounded and regular the waveform; the brighter and buzzier the sound (the result of more overtones), the more jagged and irregular the waveform.
If you want to learn more about the relationship between music and mathematics, check out my Yamaha blog “The Numbers Game.” For now, though, let’s start our exploration of digital FM — a great synthesis technique precisely because it gives us extremely fine control over all three aspects of sound: amplitude, frequency and timbre.
The basic building block of digital FM synthesis is called an operator. It’s actually quite a simple software device (the fact that it’s “software” means that it doesn’t exist physically, just as a series of numbers — but no need to concern yourself with that), consisting of just three components: an oscillator (“OSC” for short), an amplifier (“AMP” for short) and an envelope generator (“EG” for short). Here’s the way they’re interconnected:
As you can see, the signal starts with the oscillator (same as it does in analog synthesis methods); it’s then sent into an amplifier, under the control of an envelope generator, which enables its amplitude to be varied over time (more about this in Part 4). The signal then leaves the operator, to be routed either to the output of your instrument so you can hear it … or, more intriguingly, to the input of another operator. (We’ll unravel this particular mystery in Part 3.) For now, let’s focus on an operator whose output can be heard directly, either through your synthesizer’s main outputs or its headphone output.
Such an operator is called a carrier, and up until the introduction of FM-X synthesis (available today in MONTAGE and MODX), most digital FM carriers were only capable of producing one kind of waveform — the simplest one known to man. This is the humble sine wave, which contains no overtones at all.
Wait, you say! Didn’t I tell you a just few paragraphs back that all sounds consist of a particular blend of overtones? Well, check the fine print, because what I actually said was that all naturally occurring sounds have overtones. (Sneaky, I know.) And sine waves do not exist in nature; in fact, they can only be generated by electrical circuits (such as the ones in analog synthesizers) and in digital emulations of electrical circuits (such as the ones in digital synthesizers.)
We’ve seen that the timbre of a sound is reflected in its waveform, with gentler sounds having more rounded shapes, so you might expect that the waveform of a sine wave would look something like that of a flute, only even more rounded … and you’d be absolutely right. Here’s what a sine wave looks like:
Ready to listen to a sine wave on your MONTAGE or MODX? It’s super-easy, since both instruments offer a simple procedure for calling up basic “initialized” sounds, including one for FM. Simply perform the following steps:
You’re nearly there, but in order to be able to more clearly hear the various exercises we’ll be giving you, you’ll want to turn off the reverb. This is also a very simple process, but if you’d rather skip the button-pushes, just go to Soundmondo (our way-cool social website that allows you to discover new synth sounds, as well as organize and share your own sounds) and click here to download the Performance named “Part 2_01.” (This is simply “Init Normal (FM-X),” saved without reverb.) For more information about Soundmondo, check out this blog article.
For you do-it-yourselfers, here’s the step-by-step procedure:
Here’s what it sounds like — the smooth, gentle tone of a pure sine wave, without any reverb:
Whether you downloaded this from Soundmondo or created it with button-pushes, press the Store button to store it in your MONTAGE/MODX. (If you created it with button-pushes, rename it “Part 2_01” — the name under which it appears in Soundmondo).
Now we’re ready to move on to another basic concept, called …
To begin this discussion, let’s go back to the illustration of an FM operator we presented earlier:
Obviously, this graphic is a bit lacking since an operator doesn’t decide on its own when to start playing, nor does it decide what pitch to play — those decisions are made by you, via control and pitch inputs. These two signals are generated by your synth’s keyboard when you play a note (or notes), or by an external MIDI device (such as a MIDI sequencer or external MIDI controller), or both. So here’s a more accurate picture of an operator:
It will of course be easier to understand this concept if you can hear it, so call up the “Part 2_01” Performance you downloaded or created earlier and do the following:
This calls up the Edit – Part 1 – Common screen, which looks like this:
Use the INC/YES button or data dial to change the Coarse setting to 2. The value in the Ratio box (circled in red) changes to 2.00:
Play the same notes again, and notice that they are now transposed up an octave. That’s because the operator is applying a Ratio of 2.00 (in other words, double) to each note being played. Entering in a Ratio of 3.00 multiplies each frequency played by 3 … which means you’ll hear not the note you’re playing on your keyboard, but that note an octave and a fifth higher. (In other words, if you play A above middle C, you’ll hear a sine wave of not 440 Hz, but 1,320 Hz [1.32 kHz], which is not an A but an E.) Try entering in different Ratio values — you’ll soon get the hang of how it works.
The Fine box next to the Coarse box allows you to enter in fractional Ratio values in 1/100th (.01) increments, from .01 to .99. To see how this works, return the Coarse value to 1.00, then touch the Fine box and move the data dial clockwise. Set the Fine value to 27, for example, and you’ll see (and hear) a Ratio of 1.27, which will result in some in-between microtuning notes not found in the Western scale:
So far, we’ve just been listening to a single Operator – Operator 1. But FM synthesizers always offer multiple operators; in the case of modern instruments like the MONTAGE and MODX, there are eight of them, all functionally identical, labeled OP1, OP2, OP3, etc. (Even early FM instruments like the original Yamaha DX7 offered six operators.) Let’s see what happens sonically when we bring in these extra operators.
Begin by again calling up the “Part 2_01” Performance (this will remove any changes you made and restore the sound the way it was) and then do the following:
Finally, let’s talk about the subtle pitch changes you can make with the Fine control. It’s easy to pick out the sound of two notes played an octave apart, or two completely different notes (say, an A and an E), but whenever you play two sounds that are close in frequency to one another (such as, for example, the two or three strings for each piano key), the human ear can no longer perceive them as being separate. Instead, we hear one sound, and if the frequencies are even slightly different, the sound may have a pulsation (a “beating”) occurring at a rate equivalent to the difference between the two frequencies. (This, by the way, is the technique used by piano tuners as well as many guitarists and bassists — at least in the days before electronic tuners and software tuning apps!)
To demonstrate this phenomenon, load the “Part 2_01” Performance and then do the following:
Now play the octave above that same note. The movement is more rapid now — in fact, it’s occurring twice as fast. The reason this is happening is because the Ratio value is used as a multiplier, so with these settings, if you play A above middle C, for example, you hear a sine wave with a frequency of 440 Hz (Operator 1), plus another sine wave at 444.4 Hz (Operator 2); the difference between the two is 4.4 Hz, which is the rate of the movement you’re hearing. But if you play that same note an octave higher, you hear a sine wave at 880 Hz (Operator 1), plus another sine wave at 888.8 Hz (Operator 2), for a difference of 8.8 Hz — double the rate of movement you heard when playing the note an octave lower.
These multiple movements in the sound make for a very pleasing effect. For example, try playing a C major triad (middle C and the E and G above it), which sounds (and looks) like this:
Last but not least, let’s talk about a similar function in that same screen, called Detune, shown here circled in red:
You can, of course, add in the other operators and set each to different Fine and/or Detune values to create intricate movements within your sound … even if that sound just consists of simple sine waves. Now imagine the possibilities with complex waves (that is, waves that have overtones). Tune into Part Three (“The Magic of Modulation”) to see just how that works … and a whole lot more!
If you missed the first part of the series and want to know more about the history of FM synthesis check out Part One: Discovering Digital FM…John Chowning Remembers.
Want to share your thoughts/comments? Join the conversation on the Forum here.
© 2024 Yamaha Corporation of America and Yamaha Corporation. All rights reserved. Terms of Use | Privacy Policy | Contact Us