WDSPA

Prof. Oscar Pablo Di Liscia

odiliscia@unq.edu.ar

General

WDSPA is a computer program that was designed and written by Oscar Pablo Di Liscia,  as a part of the research project Music and Drama: new dimensions in performance, (Oscar Edelstein/ Oscar Pablo Di Liscia). The project is hosted by the Universidad Nacional de Quilmes (Argentina), and supported by the Education Ministry of the National Government of Argentina.

WDSPA is  written in the C programming language for Windows OS (Microsoft). At present was tested successfuly under Windows98 and Windows95.

WDSPA is based on a program already developed (DSPA, O. P. Di Liscia, 1997-1999) with versions both for Linux and Windows OS. However, WDSPA has its own GUI (Graphic Interface) while the formers are command-line versions.

WDSPA uses the following cues to simulate room, location of source and movement in a bi-dimensional space:

1-Amplitude scaling: using either Intensity Panning or Ambisonic.

                    At present WDSPA only handle First Order Ambisonic, B Format, other formats will be further added.

To obtain more information on Intensity Panning and Ambisonic, please consult Appendix I

2-Frecuency shift: according source and listener relative speed (Doppler shift)

3-Early echoes: according room geometry.

4-Global reverberation and Local reverberation: with control of diffusion, t60, gain and direct / reverberant ratio. The global reverberation was designed using networks of eight Comb/Lowpass filters in paralell feeding an array of four Alpass filters in series per channel.  The adjustements of gains and delays were borrowed from the Freverb reverberator (By Jezar) which is public domain and got straightforward results using this technique.

The data of general configuration (room size, wall absorption, T60, etc.) as well as of movement path is stored on a binary file with special format.

The program assumes the listener is located at the origin of a Cartesian plane and surrounded by four loudspeakers(having angles of 45, 135, 225 and 315 degrees), at equal distance (unity by default).

Since the program was conceived for sound reproduction under loudspeakers, some powerful cues (HRTF) were disregarded. A future version will include it as optional.

WDSPA reads an input audio signal stored in an input file (RIFF WAV standard format) and writes the result of the action in one (stereo) or two (stereo) output files (RIFF WAV standard format also). The parameters of the output signal (Sampling Rate and Bytes per sample)will be taken from the input file.

Terms & Conditions

The autor grants to you the right to use and distribute this version of WDSPA in binary executable form for academic and non-comercial purposes only.
WDSPA is provided "as is" without any expressed or implied warranties.
The autor does not intend to distribute a commercial release of WDSPA.

Ambisonic is a registered trademark of Nimbus Communications International.

Specific features of WDSPA

The spatialisation is performed using two groups of data which can be generated using WDSPA and stored on disk in files with propietary format (*.spa, the format of the file is available under request):

Parameters:

Those are general data like T60(reverberation time), Gain, Room size, etc.

You can access and edit this data in the Audio/Parameters menu.

Path:

In order to simulate movement and/or location of a virtual sound source, WDSPA needs a "spatial path". This path is stored in N segments

, each one being limited by two points (or nodes). A simple example of path may be:

x=1 ;first node x rectangular coordinate

y=1 ;first node y rectangular coordinate

t=100 ;time in % of the input signal duration, it will take to go from first node to second

;100 will mean the entire sound file

a=0 ;zero will produce constant speed. Nonzero positive and negative numbers will produce an exponential speed curve (increasing to the middle of the segment, decreasing till the end of the segment)

 

x=-1 ;second node x rectangular coordinate

y=1 ;second node y rectangular coordinate

Accessing to /node/edit segment data, you can specify x,y,t and a arguments typing numbers.

The path can be created and modified by:

1-Left-clicking the mouse on a location where there is NOT a node, will create a new node.

2-Holding the left button on a node will allow to drag it to any location.

3-Right-clicking on a node will open a menu which offers:

-undo

-delete the node

-interpolate new node/s: at present, only linear interpolation between the two nodes involved is used. The number of nodes to interpolate can be set at /node/interpolation settings.

5-You can use the "path" menu to transform the entire path on some useful way. Beware of using transformations that make the path exceed the room dimentions: the path will be "clipped" to fit it.

Some transformations are very easy to guess while other are not so. For example, the "set equal time steps" will set the same time for all the segments (the number of segments is computed and the "t" for each one is set to 100/nsegments), while the "proportionalize time steps" will make the time steps proportional to the distance between points (thus, constant speed).

See, as an example, a path and its tranformation by scaling only the Y dimentions on the following plot:

OTHER FEATURES:

Selecting /audio/bang will process the input file using the data stored.

You can set your favorite Audio Editor on /audio/audio editor, and

selecting /audio/edit audio will just call the executable selected by command line, using the output file/s as argument/s. It should work for most programs...

There are also menus for File and Help (this file) Handling.

In addition, some programs using special routines to "design" spatial trajectories were also written, since a constant experience has proved that to draw with mouse strokes is somewhat tedious an useless when complicated movements are required. The solution will be further include within WDSPA special routines to draw trajectories. See, for example, the plot of one output path of the program circle.exe (O. P. Di Liscia), which consists on a moving phasor with control of frequency, direction, X, and Y scaling:

Difficult to draw just using the mouse by hand....uuuh...?

Acknowledgements:

The author is most grateful to:

Juan Pampin, and Fernando Lopez Lezcano for so many useful suggestions and assessment.

C.A.R.T.A.H. (Center for Advanced Research Technologies in Arts and Humanities of Washington University, Seattle, USA) for support and hardware supply in research and testing (Ambisonic Microphones).

Richard Furse and David Malham for their valuable developments and documentation on Ambisonic technology.

To Jezar, Technology Consultant at Dreampoint Design and Engineering (UK), for providing the data for high quality reverberation using arrays of comb and alpass filters (Freeverb).

APPENDIX I:

Ambisonic versus Intensity Panning:

One of the most easy controllable method for the angular location of sound over loudspeakers, consists in the scaling of the energy emitted by them. The procedure most usually involved is widely known and referred to as intensity panning (See Moore 1990; Bossi 1990; Chowning 1971; and Dodge & Jerse 1985).

It is possible to calculate the amount of energy delivered by a sound source for a given spatial location, listener location, number of channels, location of the loudspeakers and directional characteristics of the source, to scale the gain of a signal at each output channel of a reproduction system.

To simulate a sound source located at any angle using Intensity Panning, the energy delivered by the sound source must be distributed between pairs of loudspeakers, and the sum of the signals emitted by both of them must always be the same for any angle value. To accomplish this, the following trigonometric identity may be used:

Cos(A)2 + Sin(A)2 = 1

Where A is any angle in radians.

We need know next how to use the above equation on a practical situation (i.e., which gain values we will obtain from it). Since energy equals to the square of amplitude, using the sine and cosine functions will produce the correct result in terms of amplitude.

As a simple example, assuming that we have two loudspeakers at angles of 0 and p/2 radians, then the gain for each channel will be:

Ch1= Cos(A)

Ch2= Sin(A)

Where A is any angle between 0 and p/2 radians.

Another procedure may be to calculate the energy and then obtain from it the correspondent amplitude for each channel taking its square root (Bossi 1990).

The above equation does not take in account neither the distance from the source to the listener, nor the directional characteristics of the source. Both the distance scaling and the directional characteristics scaling factors are, however, not exclusive of intensity panning, and may be used in other cases as the one I will deal with next (Ambisonic).

To take in account of distance, the equation must be restated as:

Ch1= Cos(A) / (distance + offset)

Ch2= Sin(A) / (distance + offset)

(the offset term may be necessary in order to prevent the cases in which the distance is less than the unit -specially when it becomes zero...- ). Some psichoacoustic research had shown that distance cues may be more perceptually effective using a different scaling. This being the case, a scaling exponent may be used to raise the distance factor at a power greater than one, thus producing a more pronounced ("exaggerated") gain curve as the source moves away from the listener:

distance factor = (distance scaling exponent + offset)

Finally, the signal at both channels may be scaled by a directional factor derived from the directional characteristics of the sound source (if this is not done, the source is considered omni directional...). We can find a useful approach to this in (Moore 1989). Moore proposes to model radiation vectors(with their magnitude determined by their angle), and shows how a supercardioid shape is determined in two dimensions by the formula:

r(A) = (1 + (((back - 1) | R - A | ) / p) )2

where R is the angle (in radians) of the radiation vector and back specifies the relative amount of radiation in the direction opposite to R. It can be seen that, setting back to zero will produce the cardioid shape, while setting it to one will produce an omni directional shape. A similar conception (though in a three dimension space) is found on the program Vspace (Furse 1999).

Another approach to the simulation of the directional characteristics of the sound sources can be found on the DirectSound System (which is a registered trademark of Microsoft). DirectSound attempts modeling sound sources in a three dimension space as if they were "sound cones" (Bargen & Donnelly 1998).

Though widely used, intensity panning is at present being strongly criticized because:

1-It is effective only for a small group of listeners located at the center of the listening room. This is a consequence of the so called "Haas effect" or "precedence effect" (see Haas 1951; Wallach 1973) by one hand, as well as of the "pulling effect" of the louder signal emitted by the nearby loudspeaker.

2-The signals coming from all the loudspeakers will reach both ears of the listener (on a very different way than the "real" sound source being emulated will do...), meanwhile rendering confusing information. The latter is often referred to as the crosstalk of the loudspeakers. An explanation and graphics of this effect can be found on (Ellen 1973).

3-Once a mix is done for a particular array of loudspeakers, it is not possible neither to reproduce it, nor to modify it for a different one (at least it SHOULD not be possible…).

Some British recording engineers -mainly Michael Gerzon- created a technique called Ambisonic, widely used at present. Ambisonic attempts to overcome the limitations above mentioned by encoding the signal on the same way that a special microphone would record it (as a matter of fact such microphones exist, and one of them is the Calrec Soundfield microphone). This encoding keeps the information of the energy delivered by a sound source located on a three dimension field using four signals (there are other encoding formats using more than four signals, but we will not deal with these now) as if it were recorded by an array of three figure of eight microphones (each one pointing to the three axes), plus an Omni-directional microphone. The decoding procedure attempts to recreate the wave front that the microphone "had listened" for a given array of loudspeakers (Ambisonic's specialist refers to that array as the rig). In Ambisonic, all the loudspeakers works together "pulling and pushing", and that is why it is not advisable to mix signals not encoded using Ambisonic with an encoded one. One of the main advantages of Ambisonic, however, is that if we have a signal properly encoded we may further decode it for the rig of loudspeakers we are to use in a particular situation.

The Ambisonic B Format

This format stores the directional information of the wave front encoded in four signals:

X= front - rear

Y= left - right

Z= above - below

W= Omni directional information.

The encoding process may be accomplished, either recording sound with an Ambisonic microphone, or performing DSP operations on a signal stored on a sound file. The following encoding equations are used (taken from: Malham 1998):

X = cos(A) * cos(B) * input signal

Y = sin(A) * cos(B) * input signal

Z = sin(B) * input signal

W = 0.707 * input signal

Where: A is the horizontal (azimuth) angle, counterclockwise measured.

B is the elevation angle

If the sound source is located at any point within the unit sphere, and therefore (x² + y² + z²) is always minor or equal than 1, then the encoding equations are simpler:

X = x * input signal

Y = y * input signal

Z = z * input signal

W = 0.707 * input signal

Where x, y and z are rectangular coordinates indicating sound source position on a three dimension space. Malham (Malham 1998) points out that, if placement inside of the unit sphere is required, the levels of the signals X, Y and Z will reduce the total intensity of sound instead of increasing it, as expected. Therefore, he propose to make W to change according to:

W = 1 - 0.293(x² + y² + z²)

If we are to use a computer program to encode a signal, the equations must take in account also of distance between source and microphone. For this purpose, Richard Furse use the following equations (Furse 1995):

ds = x*x + y*y + z*z

dist = sqrt(ds)

X = x / (ds + offset)

Y = y / (ds + offset)

Z = z / (ds + offset)

W = .707 / (dist + offset)

Where offset is a quantity added to simulate the core sphere radius of the microphone (thus, used to avoid infinite gain values if distance equals 0). Furse (Furse 1995), suggest offset=0.3

It would be possible also to scale the W,X,Y and Z signals by a directional factor such as the one that was already described in the latter section.

The decoding process is quite straightforward and elegant. As shown in (Bamford 1995), having a B-format signal encoded, the feed for a First Order N speaker Ambisonic System would be:

Pn = 1 / N (W + 2X cos(qn) + 2Ysin(qn))

Where qn is the angle, in radians, of the N-th speaker of the system.

Here follows a C language function which calculate the gain values for encoding and decoding a signal for First Order B Format. This function expects two floats (x, y) indicating the rectangular coordinates of the source, and one (input) being the input signal. It calculates (i.e., encode the input) the X, Y and W signals, and writes the decoded result to an array of four floats, each one being a channel of a four-loudspeakers square rig. For simplicity reasons the z (height) dimension was disregarded in this example. Most used decoding matrixs can be found in (Furse 1999).

void gain_ambisonics(float x, float y, float z, float *ch[], float input)

{

register float ds, dist;

register float amb_x,amb_y,amb_w;

/*AMBISONIC B ENCODING (Z signal ommited here...) */

ds = x*x + y*y;

dist = sqrt(ds);

amb_y = input * (x / ds);

amb_x = input * (y / ds);

amb_w = input * (.707 / dist);

/*AMBISONIC B DECODING (Z signal ommited here...) */

ch[0]= (amb_w + amb_x + amb_y );

ch[1]= (amb_w - amb_x + amb_y );

ch[2]= (amb_w - amb_x - amb_y );

ch[3]= (amb_w + amb_x - amb_y );

return;

}

References