27 April 2022

Watching a film by ear: An acousmatic approach to cinema

Sound film is commonly expressed as the interplay between the audio and the visual. Since the 1920s, audiences’ cinematic experience has been dramatically transformed owing to the synchronization of sounds and motion pictures. The silent film era highlights its narration performance mainly by the continuous alternation between moving images and intertitles (title cards) while its score is accompanied by a pianist or an orchestra. With more diverse sound elements (dialogue, music, ambient sound, sound effects, etc.) being synchronized to motion pictures, sound film delivers an immersive experience and strengthens its storytelling. Cinematic audiences are required to engage more intensively in the cross-modal perception of hearing and seeing. This progression appears as a breakthrough for filmmakers and, simultaneously, new advanced perceptual challenges for the audiences’ eyes and ears.

One of the challenges emerges as the biased visuocentric perception when it comes to comprehending film narrative. Despite enhancing the realistic experience of cinematic audiences, sound is not regarded as the key role in film storytelling as the visual components. In comparison to sounds, visualized objects on the screen are believed to render themselves more ostensibly, instantly, and directly to audiences (Wojcik, 2006). Visual storytelling is more tangible as a manifestation of verisimilitude supported by techniques such as framing, lighting, scene transitions, camera motions, etc. Whereas, film sound designing is usually noted as a post-production of adding values, such as subliminal emotional impacts, to an established and scripted narrative attributed to a montage of captured motion images (Chion, 1994). At a common-sense level, from where the audiences see it, the question for sound to defend itself is: A film without sounds remains a film and we can probably figure out its story through the motion pictures. However, does an imageless film with only sounds make any sense?

In the case where sound is separated from the visual, the acousmatic theory comes in handy. The term acousmatic sound, coined by Pierre Schaeffer in 1966, refers to sound as an autonomous object that draws our attention to the intrinsic traits of it. The word acousmatic originates from akousmatikoi, a teaching method of Pythagoras where his pupils are required to sit behind a veil so they would concentrate better on his oral lectures. Solely perceiving sounds without the visualized causes assists the intention to disclose the sound objects’ essences and primary features (Kane, 2014). Likewise, when considering film sound as acousmatic sound and disintegrating the audiovisual contract, one can find that unseen sound considerably informative and, consequently, they can discover the tellability of sound in an imageless film.

Making senses out of sounds without their visualized sources, however, requires specific methods of listening. In this essay, I propose the three listening modes of Michel Chion owing to their direct relevance to film sound. In his book on audio-vision in cinema studies, Chion (1994) identifies the three modes of listening as causal listening, semantic listening, and reduced listening. Causal listening, the most well-known mode, takes place when one attempts to identify the source of a sound. For instance, a dog can be the source of barking sounds. The second listening modes, semantic listening, involves interpreting codes or messages embedded in the perceived sound. Again, a dog’s barking sound can indicate the dog as the object but, at the same time, it can inform us of either a state of triggered anxiety of the dog or a sense of nearby danger. The last mode of listening, reduced listening, is described to emphasize the traits of the sound itself. Rather than implying the dog as the cause or hinting a warning of danger, reduced listening draw listeners’ attention to the barking sounds’ pitch, timbre, tempo, loudness, repeatability, etc.

In the context of film narrative, by utilizing the listening modes when examining acousmatic sound fragments excerpted from films, audiences can detect the narratological implications of sounds. Narratological implications of sounds are aural factors whose states or actions are involved in narration and indicate the tellability of the discourse. In this case, the discourse is the film sound fragments of auditory scenes, and the aural factors can include different features such as noise, sound effects, the manipulation of silence, etc. Looking into the two auditory scenes, the Eagle Pass Hotel scene excerpted from No Country For Old Men (2007), and the Paris assassination scene excerpted from Munich (2005), this essay will address some notable narratological implications of sounds in film sound. Readers are recommended to temporarily omit the visual when experiencing the scenes. The timestamps in the following paragraphs will be aligned with the time duration of the two attached clips above rather than the original films.


The first narratological implication is the manipulation of loudness that draw audiences’ attention to a key focus. The key focus can be an object, a person, an action, or a movement within the story. Both auditory scenes possess certain individual sounds whose amplitudes significantly put their imaginary sources closer to the camera. The Eagle Pass Hotel scene stresses the sound focus on the main character and his perspective by intensively amplifying sounds heard and caused by the character. For instance, the amplified breathing sound and footsteps sound suggest the key focus is the action caused by the main character. Whilst, other sounds like the loud bang or the engine sound indicate their imaginary sources, a gun, and a vehicle, as the key focuses. As for the Paris Assassination scene, although it manipulates more background noises, it still renders the sound focuses on specific objects and actions through the sound amplification. At 02:55, the metallic sliding sound among the city street noises, for example, is amplified to highlight the two key focuses: the phone as an object and the action of dialing a phone call.

The next narratological implication is the spatialization of sounds that sets up the context of the narrative. With the technique of sound layering, background noises and ambiences are embedded in the auditory scenes to inform the audiences about the scenes’ atmospheres or locations. The Paris Assassination scene reveals its indoor space and outdoor space by making use of the ambiences of street noises and the quietened white noises. The Eagle Pass Hotel scene, with fewer sound layers, illustrates the indoor space of the hotel using the quietness and indicates the open street space using distant vehicle’ sounds. Besides the design of background sounds, both scenes also carry similar indicative key sounds that imply specific locations. For instance, different sounds caused by the characters’ footsteps on different surfaces, such as the squeaky and clacking sounds of the wooden floor or the clomping sounds of the concrete ground, suggest distinct places where these surfaces exist.

The third narratological implication is the cuts of sound shots that establish the linearity of the storyline. The Eagle Pass Hotel scene’s linearity is anchored to the perspective of one character while the Paris Assassination scene’s linearity is fragmented into different angles that comprise a panoramic view. In the Eagle Pass Hotel scene, with the sound elements and sound effects revolving around the main character and his actions, the chain of sound shots leads audiences through every moment of his experiences. Despite the appearances of any other characters in the auditory scene, the consecutive cuts of sound shots where the aural traits of the main character consistently exist and prevail justify that the linear sequence of the story is only under his perspective. For instance, at 04:43, when the character is outside on the street, his footsteps sound remains its high amplitude while the vehicle’s engine sound gradually fades in, as a suggestion that it is getting nearer until it gains the same loudness. The Paris Assassination scene, on the other hand, involves a lot of non-linear cuts of dissimilar sound shots, which offer different perspectives on the same event. It can be observed that this auditory scene directs audiences’ perceptions to distinctive shots that portray different happenings that occur simultaneously. At 04:10, for example, the shots of the rotary phone’s sliding sound accompanied by the outdoor noises and the walking sounds accompanied by the quietened indoor ambiences are continuously interchanged.

With Michel Chion’s three modes of listening being applied to the acousmatic scenes, our aural perception is assisted to give access to comprehension and interpretation. The causal listening mode helps the determination of sounds’ imaginary sources in the story world. However, it is inevitable to encounter misinterpretations and false speculations when defining the sounds’ sources. The semantic listening mode engages in interpreting the sounds that possess implied meaning or anticipate atmospheres. It also elevates the acknowledgement of sounds’ references to specific places or actions. In the Eagle Pass Hotel scene, semantic listening elucidates the caution of the character when moving as quietly as possible, the anxious atmosphere induced by heavy breathing and running sounds, and the brutal gunshots sound that gives to violence in the climax. In the Paris Assassination scene, the empathetic ambiences either string along with or anticipate the scene’s sequences. For example, when there is a troublesome interruption, the accelerating tension is implied in the uneasy ambiences. The non-English indistinct conversations detected in the background sound imply a non-English location. As for the reduced listening mode, it serves the purpose of examining the traits of sounds themselves such as amplitude, pitch, frequency, etc. In the case of differentiating sounds that share similar imaginary sources, it emerges as an effective mode of listening. For instance, in both auditory scenes, despite sharing the common imaginary source as a person, the footsteps of different characters on certain surfaces generate different sounds that have distinctive amplitude and pitch.

The narratological implications of sounds found in the two auditory scenes challenge the visuocentric perception when it comes to comprehending film narrative. The acousmatic sounds express their ability for storytelling without the visual. This observation, however, neither disregards the ocular ideology nor imposes a phonocentric idea upon films. Rather, I want to emphasize the pluralism of cinematic experiences. Audiences can take creative approaches towards film and its narrative. Rather than solely relying on the hierarchical approach, where images come first and sound come later, or the prominent-visual approach, where images are the informative component and sounds are the secondary element, one can turn to the inverse processes regarding this essay’s approach.

[This essay was written by Du Hoang Bao Linh as part of her undergraduate thesis at Ritsumeikan Asia Pacific University in Japan. Her thesis received recognition as an Outstanding Thesis for the class of Spring 2022. She majored in Culture, Society and Media while at APU. Besides film and music, she is also interested in two other fields of interdisciplinary studies: education and multiculturalism.]

No comments: