Spectral Comparison of Event Transients

I spent several weeks in 2006 working on a way to analyze incoming audio events to try to identify events whose attacks were more similar or less similar. My hope was that by sorting events in (approximate) order of similarity, the events could be used in granular synthesis or other musical processes to highlight similar sounds that might be far apart in time. So far I am quite pleased with the results and expect this to play a pretty major role in my compositional strategies going forward.

I think this will be useful for others, so I want to publish the code here with some advice for use.

Prerequisites

This code requires both dewdrop_lib and chucklib, which may be downloaded from my main SuperCollider site.

Note: This code only manages the analysis. There is no requirement that your synthesis processes must use my libraries. Once the analysis is complete, you can take the data and to do anything you like with it using whatever framework you choose.

Installation

Place the file "startup21-clientfft.rtf" (download link at the top of this page) into your ~/scwork/chucklib/ folder. The prototypes will be loaded whenever the class library is recompiled.

Analysis technique

Here is a high-level overview of the manner in which the analysis takes place. Use this description to decide if my code will be useful for your purposes.

Retrieval of audio data: by default, 2048 samples (just under 50 milliseconds at 44100 KHz sampling rate) will be read.
FFT conversion: uses Signal:fft -- the server's FFT ugen is not required.
Conversion to mel frequency scale: the linear frequency scale of a normal FFT exaggerates the importance of high frequencies, so the FFT frequency bins are converted into the logarithmic mel frequency scale.
Identification of spectral peaks: the slope of the curve is used to identify frequency bins where the magnitude reaches a peak. For each peak, the bin index and highest magnitude are stored.
Comparison: The basic idea is that similar sounds will have a similar number of peaks, similarly placed, and at about the same magnitude for each. Quotients are taken between all these statistics to produce a single number. Identical sounds should produce the result 1.0; the higher the number, the greater the spectral difference.

Note that this technique produces an estimate of spectral similarity. I've found it to be acceptable for musical use. If you wish to make a more sophisticated comparison, you may create your own variant of PR(\fftDataProto).

Process prototypes

The code includes four process prototype (PR) objects. (For a discussion of the purpose of PRs, see the chucklib overview tutorial.)

PR(\fftDataProto): Responsible for receiving a snippet of audio data and calculating the spectral features that will be used for comparison. You should not use this prototype directly; it's used by the others.
PR(\transient_analysis_file): Performs offline analysis of audio from a file. The scsynth server does not need to be booted to run this process.
PR(\transient_analysis_buffer): Performs offline analysis of audio from a server buffer. The buffer should be preloaded with the audio to analyze.
PR(\transient_analysis_incr): Performs real time analysis of audio while it is being recorded into a server buffer. Suitable for live performance use.

Basic usage: offline analysis

PR(\transient_analysis_file) and PR(\transient_analysis_buffer) perform the analysis on a single audio stream. The file or buffer must be populated prior to execution, and you should also have the time points you want to analyze in an array. (In the future I might consider adding a feature detector to identify the time points for you, but for the moment, you must do whatever feature detection you choose before running the analysis.)

Usage is simple:

// File analysis

PR(\transient_analysis_file) => BP(\analyzer);
BP(\analyzer).v.startAnalysis("path/to/file.aiff", time_point_array, num_FFT_samples);

// Buffer analysis

PR(\transient_analysis_buffer) => BP(\analyzer);
BP(\analyzer).v.startAnalysis(my_Buffer_object, time_point_array, num_FFT_samples);

Time points should be given as seconds from the beginning of the file or buffer. num_FFT_samples defaults to 2048, but you made override the default here. The number of samples must be a power of 2.

The buffer analyzer requires a Buffer object; a buffer number is not sufficient. The buffer's numFrames and sampleRate variables must be populated.

All the operations run in the background on AppClock; SuperCollider will not hang while the analysis is running. This means you can analyze a long file with hundreds of time points and continue to use SuperCollider.

The number of comparisons is n * (n-1) / 2, where n is the number of time points. For large files, the analysis can take several minutes. If you want status updates to appear in the post window, run the following command before starting the analysis:

BP(\analyzer).v.postProgress = true;

Status updates are turned off by default.

Result format

The file and buffer analyzers store the results in a BP variable matrix -- access it by BP(\analyzer).v.matrix. The matrix is a n*n square array:

Row 0: All comparisons against time point 0.
   Column 0: Time point 0 compared against time point 0.
   Column 1: Time point 1 compared against time point 0.
   etc...
Row 1: All comparisons against time point 1.
   Column 0: Time point 0 compared against time point 1.
   Column 1: Time point 1 compared against time point 1.
   etc...

Each array item is an Event with the following data:

item.metric == the comparison result
item.time == the time point for item.index
item.peak == the peak amplitude for this fragment -- use this value in your synthesis code to normalize the fragments to the same peak amplitude
item.index == the column index into matrix (the target of the comparison)
item.refindex == the row index into the matrix (the reference for the comparison)

One effective way to use the comparison is to choose a row in the matrix, and sort the items based on the metric values in ascending order. The result will be a one dimensional array in which the reference sample is first, and successive samples will get progressively more distinct from the reference toward the end of the array.

z = BP(\analyzer).v.matrix.choose.copy.sort({ |a, b| a.metric <= b.metric });

During synthesis, use item.time and item.peak when reading the data for playback.

At a later date, I will post some synthesis examples using crossfades between adjacent samples in the sorted results -- quite effective musically.

Real time analysis

This process prototype is quite a bit more complex. It's really a buffer analyzer and buffer manager rolled into one. It maintains a user-defined number of buffers. On a signal from the user, audio will be recorded into the buffer until another user signal (or until the end of the buffer). At the same time, a feature detector is running to capture time points for analysis. Whenever a time point comes back from the server, the required analyses get queued up to be executed on background threads, which run while the recording is taking place. In many cases, the analysis is finished by the time the buffer stops recording, so the results can be used immediately for synthesis.

To override the default parameters, it's recommended to instantiate using .chuck and a parameter dictionary, rather than =>.

PR(\transient_analysis_incr).chuck(BP(\analyzer), nil, (
   parameter1: value1,
   parameter2: value2
   etc...
));

Parameters will be described below.

Buffer management

Buffers are stored in an array BP(\analyzer).v.bufs. Each item of the bufs array holds an event with the following values:

item.buf == Buffer object
item.status == current activity on the buffer:
    'idle' == buffer has no data
    'ready' == buffer has data and is not being played
    'play' == somebody is using the buffer for playback
    'recordPending' == the buffer is selected for recording, but waiting for the user signal
    'record' == actively recording
item.dur == how long was the last recording
item.peak == peak amplitude of the last recording
item.recTime == when did the last recording happen? (To identify the oldest buffer for reuse.)
item.playCount == how many synths are using this buffer?
item.ontimes == the timepoint array
item.peaks == peak amplitude at each timepoint
item.matrix == analysis matrix for this buffer

The real time analyzer uses dependencies to notify other objects when a buffer becomes ready for use.

// to receive notifications of buffers becoming available
BP(\analyzer).addDependant(myProcess);

Dependents of the BP object will receive one of the following as the "what" argument in its .update method.

'bufRecord': the buffer is chosen for recording (record is pending)
'bufReady': analysis is complete
'play': recording is active
'stop': recording has stopped

// to receive notifications of a specific buffer's status changing
BP(\analyzer).v.bufs[index].addDependant(myProcess);

Dependents of the buffer holder will receive its new status as the "what" argument in its .update method.

Also, when you create a synth node using the buffer, you should call BP(\analyzer).v.bufferPlayingNewNode(bufferEvent, synthNode) so that .playCount and .status will update automatically. This will prevent a buffer from being chosen for recording while a synth is using the buffer.

Parameters

Customize the behavior for your piece!

numBufs == how many buffers to allocate
bufDur == how many seconds each buffer should be
onsetRejectLimit == minimum amount of time between feature detections
minPeak == minimum peak amplitude for each audio fragment (and the whole recorded buffer)
recordActive == automatically enable a buffer for recording on instantiation
audioThru == true: send mic input to the mixer output always; false: send mic input only when recording is active
audioThruLevel == gain factor for audio throughput; not used by default
mixerOutChannels == how many channels for audio thru
inputBusIndex == the bus number for the audio input; by default the same as AudioIn.ar(1)

By default a MIDI trigger is created for the sustain pedal to start and stop recording. You can override the channel and controller number, or write a new makeTrigger function to use an entirely different device. Triggers can also be generated on the server and mapped directly to the input and detector synths' t_trig inputs.

midiTriggerChan == MIDIChannelIndex for user signal (see dewdrop_lib MIDI classes)
midiTriggerCtlNum == MIDI controller number
makeTrigger == a function to create the user signal responder; by default a BasicMIDIControl

makeFeatureDetector == a function returning the SynthDef to be used for feature detection. Read the default synthdef carefully to see how yours should be written
detectParms == a Synth argument list for the feature detector; may also be a function returning an argument list

makeInputSource == a function returning the SynthDef to be used for audio input. Read the default synthdef carefully to see how yours should be written. The SendTrig ugens must not be modified if you customize this.
inputParms == a Synth argument list for input source; may also be a function returning an argument list

postProgress == true: print feedback while recording; false: run silently

Known issues

If you are using audioThru == true, there will very likely be a glitch in the input audio if buffer recording stops when the input is not silent. I am working on this.

Occasionally during analysis, an error may be reported in the post window due to an invalid object being retrieved from the analysis queues. This is a bug in the PriorityQueue class, not in my code. The analyzer will attempt to recover automatically and will be successful in most cases. Don't be alarmed unless you see a message like the following after analysis:

BP(\analyzer): Did not recover from [x] errors.

That message means that at least one matrix element is not populated. Sorting and looping over the arrays might fail.

Usage examples

I've added a file to the code archive with a few simple examples.

1. A simple, straightforward analysis of the usual sound file (a11wlk01.wav) based on random points. Also illustrates one way the analysis points may be played back in order of similarity.

2. Here, the sound file is loaded into the buffer and a feature detector is run to determine analysis points more intelligently. Then we do a random walk over the analysis data, and also illustrate granular synthesis crossfading between adjacent samples (according to the analysis).

3. Considerably more complex, illustrating how to use the live, real time analyzer in conjunction with a player process. Be sure to have a microphone ready to use; if you're in MacOS 10.4, you might need to make an aggregate device to use the built-in mic. The main points to observe in this example:

The playback process is modeled after my drum sequencer, PR(\bufPerc). To use it this way, I had to add additional logic to receive new buffers. The changes were significant enough that it warranted the creation of a new process prototype cloned from the original bufPerc process (i.e., a subclass). Once the new prototype is defined, the player BP gets created with a single line. At that point I could create 2, 5 or 10 players as easily as creating just one.
Creating the dependency between the analyzer and the player also takes only one line. At present, the depending must be the value in the BP(\player) slot -- i.e., BP(\player).v -- and not BP(\player) only. I will probably remove this restriction sometime in the future.
Using a reference to the algorithmic composition function, instead of embedding it in the prototype. By modularizing the compositional behavior into a separate object, it is easier to change the function without losing all the other information in the BP. To change the behavior, I would only need to do this (left as an "exercise for the reader"):

{   ... code to populate the ~amps, ~start and ~bufs arrays
        and also tell the analyzer which buffer is in use;
        this code is modeled in the example ...
} => Func(\myNewAlgorithm);

BP(\player).v.pbindPreAction = \myNewAlgorithm;