Concatenative Synthesis for Novel Timbral Creation

ABSTRACT

Modern day musicians rely on a variety of instruments for musical expression. Tones produced from electronic instruments have become almost as commonplace as those produced by traditional ones as evidenced by the plethora of artists who can be found composing and performing with nothing more than a personal computer. This desire to embrace technical innovation as a means to augment performance art has created a budding field in computer science that explores the creation and manipulation of sound for artistic purposes.

One facet of this new frontier concerns timbral creation, or the development of new sounds with unique characteristics that can be wielded by the musician as a virtual instrument. This thesis presents Timcat, a software system that can be used to create novel timbres from prerecorded audio. Various techniques for timbral feature extraction from short audio clips, or grains, are evaluated for use in timbral feature spaces. Clustering is performed on feature vectors in these spaces and groupings are recombined using concatenative synthesis techniques in order to form new instrument patches.

The results reveal that interesting timbres can be created using features extracted by both newly developed and existing signal analysis techniques, many common in other fields though not often applied to music audio signals. Several of the features employed also show high accuracy for instrument separation in randomly mixed tracks. Survey results demonstrate positive feedback concerning the timbres created by Timcat from electronic music composers, musicians, and music lovers alike.

DOMAIN SPECIFIC BACKGROUND

Figure 2.1: In a time or shift invariant system, shifting an input signal results in an identical shift in the output signal

Figure 2.2: In a linear system, an amplitude change of the input signal results in an identical amplitude change in the output signal

Systems that satisfy the additivity property, expressed in Equation ??, as well as the homogeneity property, expressed in Equation ?? are said to be linear. An example of an operation by such a system is shown in Figure 2.2. Likewise, if a time delayed input to a system produces the same output as an undelayed input but shifted in time, then the system is considered time invariant, as shown in Figure 2.1.

IMPLEMENTATION DETAILS

Figure 3.1: Diagram of the flow of the Timcat framework

The analyzer performs signal analysis on the grains, saving the data points in a database keyed on file name for later use. The synthesizer then performs clustering and concatenates the audio segments based on the output of the analyzer, ultimately outputting audio files that represent new timbral patches. The flow of the framework is represented in Figure 3.1.

Figure 3.4: Plot of the fast Fourier transform of a flute playing F4

As an example, Figure 3.4 shows a plot of the energy at each harmonic of an F4 fundamental as played by a flute. If a piano played the same note in the same room as the flute, the energy levels at each harmonic would be different which would be indicative of the pianos difference in timbre when compared to the flute.

RELATED WORK

Figure 4.1: Example of a spectral envelope of a double bass tone (solid line), spectral peaks of a different sound from the same double bass (solid lines) and spectral peaks of a Bassoon

The spectral envelope, or curves that represent the magnitudes of spectra in the frequency domain, emerged as one of the most frequently used tools for quantifying timbre. One such envelope is shown in Figure 4.1. As early as 1977, researchers J. Grey and J. Gordon attempted to analyze and quantify changes in perception of trumpet tones by tweaking the spectral envelopes of audio played for test subjects. In a recent project, Burred et al.

RESULTS

Figure 5.1: Kontakt ADHSR envelope configuration for virtual instruments used for the general survey

Kontakt also contains a rich set of features that allows modifying the instrument to better suit a performers needs. For the survey sounds, a basic filter was added with cutoff frequencies that resembled an AHDSR (attack, hold, decay, sustain, release) envelope. An example of the configuration of envelope for the sampler is shown in Figure 5.1.

Figure 5.2: Scale played by the virtual instruments used for the general survey

A basic MIDI track was created with a C2 on the piano keyboard played for 4 beats before playing an ascending C major scale over 16 beats at 120 beats per minute in a 4/4 time signature. An example of the MIDI track as shown on the piano roll can be seen in Figure 5.2.

CONCLUSIONS

In this thesis a software system for discovering novel timbres in prerecorded audio tracks was presented. Many features were evaluated for use in timbral spaces over which clustering was performed using various configurations of the k-means algorithm. Finally, concatenative synthesis techniques were used to generate sources for new virtual instruments. Survey participants were asked to evaluate some of the sounds created by Timcat and provided mixed to generally favorable responses.

While some found the audio produced by the software to be noisy and unpleasant, others enjoyed the instrument patches and saw the potential for the use of Timcat for making electronic music. It is this author’s opinion that the sound files produced by Timcat would benefit greatly from further processing or manipulation as opposed to using the files “as is”. Several survey participants felt this to be the case as well while others expressed that the sound files would serve much better as sound effects or ambient noise than sources for virtual instruments.

In addition to evaluating Timcat via survey, a basic test was presented to discover how Timcat performed when separating instruments in audio files based on several timbral features. Audio files were mixed and Timcat was made to separate them using its grain feature extraction methods. The use of filters over the frequency domain representation of grains were found to provide the most accurate features for segmentation. Zero crossing rate and some spectral distribution features was also found to perform moderately well for this purpose.

Source: California Polytechnic State University
Author: James Bilous

Download Project