In recent years, voice cloning and AI-generated speech have become increasingly sophisticated, mimicking nuances of the human voice such as timbre and emotion, and supporting a growing number of languages and accents. The situation has raised many security and privacy concerns. Still, as powerful as those systems are, they're not infallible, especially when they're supposed to imitate someone that the listener knows very well. That's because our voices are like fingerprints – from the way you tell a story to your coworkers to the way you greet a friend on the phone, your voice carries a complex blend of biology, upbringing, and life experience. Even identical twins don't sound exactly the same!
So what makes your voice unique? Multiple factors are at play, so let's start with investigating the physical and acoustic features of voice production and how they differ between individuals.
Anatomy Behind Your Speech
Take a deep breath, as the journey of your voice begins in your lungs. When the air is pushed out, with the help of intercostal muscles (the ones between the ribs) and the diaphragm, it travels up through the vocal tract to create speech sounds. The vocal tract includes the larynx, the pharynx, the oral and nasal cavities, and, to some extent, the sinuses.
Larynx
The larynx, commonly referred to as Adam's apple or voice box, is made of cartilage and muscle, and it's where the vocal cords (aka vocal folds) are situated. Vocal cords are crucial in speech production, as they allow us to make voiced and voiceless sounds. When the folds are very close together, the passing air will make them vibrate rapidly and thus produce voiced sounds. In English, these include all vowel sounds and many consonants, such as /b/, /m/, or /v/. Conversely, when the vocal folds are apart, nothing stands in the air's way, which results in voiceless sounds. These sounds include such consonants as /p/, /f/, or /k/. Vocal cords are also responsible for the pitch of the voice (how high or low it sounds). The faster they vibrate, the higher the pitch is.
In the illustration, just above the vocal folds, you can see ventricular folds, also known as false vocal cords. They protect the airway and usually play a minimal role in voice production, but some people have mastered using them to create guttural, growling, and distorted sounds used in some heavy metal music styles like deathcore. If you'd like to see false vocal cords in action, check out Will Ramos of Lorna Shore undergoing laryngoscopy while he performs harsh vocals in this documentary.
Pharynx
The pharynx is situated in the upper throat and contributes to shaping and amplifying speech sounds. It can also be constructed to create pharyngeal sounds used in some languages, like Hebrew and Arabic.
Oral Cavity
When the air reaches the oral cavity, it meets several so-called articulators that shape it into human speech. The articulators include the jaw, hard palate, soft palate, tongue, teeth, and lips. While native speakers of the same language tend to use the same places of articulation to create the same phonemes (e.g., English speakers tend to place the tongue against or near the alveolar ridge to create an /s/ sound), the resulting sounds can be influenced by an individual's anatomy, dialect, or even personal preference.
Nasal Cavity
On its way through the vocal tract, the air can go to the nasal cavity to create nasal sounds. This is possible thanks to the velum, the little soft tissue hanging in the back of your mouth, the position of which decides whether the air goes through the nose or the mouth. In English, there are only three nasal sounds: /m/, /n/, and /ŋ/ (the "ng" sound in such words as bring or bling).
Sinuses
While the sinuses are often omitted in the discussion of human speech, their size and shape still contribute to the resonance happening in the vocal tract, affecting the overall sound of your voice. Think about how your voice can change when you have a sinus infection. In fact, the frontal sinuses’ shape is so distinctive that forensic anthropologists use it in identifying individuals in a similar way to dental records or fingerprints.
The Unique Color of Your Voice
As you can see, many parts of your body are involved in speech production. Every tiny detail of those organs – size, length, shape, crevices, health – has an impact on how sounds resonate through your body. It essentially shapes the timbre of the speech, often called voice quality in linguistics, which allows us to distinguish sounds that have the same pitch and loudness. In other words, it gives your voice its unique color. It is similar to how sound works in music – a specific collection of harmonics gives an instrument a unique timbre, so a piano and a violin sound different to us, even when playing the same note.
Factors Affecting Voice Quality
Numerous things can affect the color of your voice, from the way your vocal tract is built to your daily habits. Let's take a look at some of them.
Vocal Fold Tension / Closure
Vocal cords also help us to manipulate our voice through the use of different levels of airflow and vibrations, resulting in different phonation types. Some well-known phonation types include:
- Breathy voice – it happens when vocal folds are pulled together, but still allow some air to escape. This type of voicing is often associated with a seductive way of speaking, similar to the way Marilyn Monroe acted in her movies.
- Vocal fry – also known as creaky voice and glottal fry – occurs when vocal folds vibrate irregularly, creating a characteristic creaky sound. Speaking with a vocal fry has recently been mostly associated with young American women, with some famous examples including Scarlett Johansson, Ariana Grande, and the Kardashians.
- Whisper – it happens when the vocal cords are partially pulled together, leaving some space called "the whisper triangle," and there is no vibration when the air passes through that space.
Larynx Height
The position of the larynx can influence the perceived color of the voice quite dramatically. When the larynx is lowered and the sound resonates more in the back, we perceive the voice as darker. When the larynx is raised and the sound has more forward resonance, we perceive the voice as brighter. Singers are excellent at using the brightness and darkness of their voices to create their preferred styles of music. Famous singers with bright voices include Mariah Carey and Bruce Dickinson (Iron Maiden), and with dark voices include Billie Eilish and Johnny Cash.
Age
Children tend to have higher voices due to smaller vocal folds and shorter vocal tracks. With aging, the shape of the vocal tract changes, and people lose elasticity in their vocal cords, which affects such characteristics as pitch, resonance, and clarity. The elderly may experience a lower pitch and more breathiness or hoarseness due to weakened muscles and reduced elasticity. These characteristics of both children's and elderly voices are actually quite challenging for ASRs, which are generally not trained on high-quality, annotated data for these demographics. ASRs struggle with children's speech especially as, apart from the higher pitch, their speech can be less clear, with incomplete words, less developed pronunciation, and more varied speech patterns.
Gender
Male voices are often lower due to longer, thicker vocal folds. This begins during puberty, when teenage boys experience a deepening of their voice as hormonal changes affect the vocal tract anatomy. Female voices tend to be higher-pitched, though hormonal changes throughout life can alter the quality. It's worth noticing that the voice, like other human traits, can often be more ambiguous depending on the individual.
Health
Colds, allergies, vocal nodules, medical procedures, and other health-related issues can influence your voice, and, for instance, make you sound more hoarse or nasal. The same can be said of some lifestyle choices like smoking or drinking alcohol – they can result in a husky, roughened voice quality.
Beyond the Anatomy
As you can see, your vocal tract is a closely intertwined system that not only allows you to speak, but also gives you a distinctive voiceprint. However, what makes you sound like yourself goes far beyond biology itself. Your personality, social connections, line of work, what languages you speak, when and where you were born, and many other aspects make the way you speak recognizable to your friends and family. We'll explore these topics in future posts.
Sources
Bruhn de Garavito, J., & Schwieter, J. W. (Eds.). (2020). Introducing linguistics: Theoretical and applied approaches. Cambridge University Press.
Trask, R. L. (1996). Dictionary of Phonetics and Phonology, London and New York: Routledge.
Van Gysel WD, Vercammen J, Debruyne F. (2001) "Voice similarity in identical twins. Acta Otorhinolaryngol Belg." 2001;55(1):49-55. PMID: 11256192. https://pubmed.ncbi.nlm.nih.gov/11256192/
Media Attributions
Figure 1. Parts of the Human Vocal Tract by Patrick J. Lynch, licensed under a CC BY 2.5 licence.
Figure 2. Vocal folds by Sandrarossi licensed under a CC BY-SA 4.0 licence.
Figure 3. Frontal bone sinuses by Alex Khimich, licensed under a CC BY-SA 4.0 licence.
Marzena Żyła