ATSH

(Analysis-Transformation-Synthesis-Shell)

Oscar Pablo Di Liscia

odiliscia@unq.edu.ar

Juan Pampin

juan@ccrma.stanford.edu
pampin@u.washington.edu

Pete Moss

petemoss@petemoss.org

 

About ATSH

ATSH is a program for analysis, transformation, and synthesis of digital sound by means of the ATS system.

ATSH was originally developed to be run under Linux, using the X windows system. The source code was written in C using GTK-GDK 1.2.0. It was compiled and tested using Linux Red Hat 7.1

We have recently released a version of ATSH for Windows (Microsoft). There is no difference between the Linux and Windows ATSH versions, except for that the Linux version seems to be faster than the Windows one drawing the screen. To present, the Windows version of ATSH was successfully tested under Windows98, and Windows NT 4 (Microsoft).

It is possible also to compile and run ATSH under Macintosh Computers using the OS10 Operating System.

ATSH was developed as part of the Desarrollo de Software para analisis y sintesis de sonido digital (Digital Sound Analysis and Synthesis Software Development, 2001-2002, O. Di Liscia, J. Pampin) research and development project with the support of:

1- Universidad Nacional de Quilmes (Buenos Aires, Argentina).

2- Center for Digital Arts and Experimental Media (University of Washington, Seattle, USA).

How to install and/or make ATSH under Linux

Installation should be straight forward if  updated versions of  Linux  and the Gnome, GTK, and GDK libraries are properly installed in the user's computer. GTK libraries can be obtained at http://www.gtk.org

Please, make sure that you are using GTK 1.2.1. or earlier. There may exist significant changes that will not allow the program to run and compile using the last versions of GTK (such as 2.0.0.)

-Download atsh.tar.gz

-Decompress it.

-Read the README file (it contains important installation and copyright information).

The ATSH distribution comes with an already compiled binary file. This file was compiled under Linux Red Hat 7.1 and tested also on most other Linux distributions (such as Mandrake, Debian, etc.) using the GTK-GDK 1.2.0 version. If it does not work on your system, then you should compile ATSH in your computer. To compile ATSH, type at the Linux prompt:

make clean

and then:

make

Let us know if you have any problems installing ATSH under Linux.

How to install and/or make ATSH under Microsoft Windows

In order to run the Windows version of ATSH, you must first install in your system the GTK Libraries for Windows.  The Windows version of ATSH was made possible due to the excellent work of Ishan Chattopadhyaya, who developed WinG, and packed all the libraries for WinGtk on a way that allows a straight forward installation of them.

Please, make sure that you are using GTK 1.2.1. or earlier. There may exist significant changes that will not allow the program to run and compile using the last versions of GTK (such as 2.0.0.)

Once WinGTK is properly installed in your system, just decompress the zip file which contains the ATSH binary file (watsh.exe). It should run without any problem. The source code for Windows is provided as well. It is almost the same than the Linux source code. In case you compile it and make any changes or improvements we kindly request you to keep us informed.

Let us know if you have any problems installing ATSH under Windows.

General Overview

At present, ATSH is a sort of viewer/editor of the analysis files generated by the ATS system (binary files usually carrying the *.ats extension).  Generally speaking, ATS files hold a representation of a digital sound signal in terms of sinusoidal trajectories (called partials) with instantaneous frequency, amplitude, and phase changing along temporal frames. Each frame has a set of partials, each of which having (at least) amplitude and frequency values (phase information might be discarded from the analysis). Each frame might also contain noise information, modeled as time-varying energy in the 25 critical bands of the analysis residual (i.e. the residual is what's left after subtracting the tracked sinusoidal trajectories from the original sound. Please visit the ATS web site for detailed information on the sinusoids plus critical-band noise model). All these features are automatically detected by ATSH reading the file's header.

The ATSH data display

To start working with ATSH, choose File/Load ats file, and select an ATS file.

There are two kind of view:

Deterministic Part View: It can be seen that the frequency of each partial is represented on the vertical (Y) axis, Time (in frames) runs along the horizontal (X) axis, and amplitude is represented with a color value. The two horizontal scrollbars control the time (frame) view. The top one controls the from-view value, and the bottom one controls the size of the view. There are three vertical scrollbars as well. The two left-most ones control the frequency view  (in a similar way the horizontal scrollbars control the time view), and the right-most scrollbar controls a contrast value for the amplitude display.  Horizontal and vertical scrollbars can be used to select and zoom in/out zones of the spectral data. The contrast slider adjusts partials amplitude display: a value of 50 shows the normal contrast between loud and quiet partials, while a value of 100 overrides amplitude information (i.e. all partials are displayed black). A value of 0 shows only very loud partials. If the mouse pointer is on the image, the frame, time, and frequency values of its position are printed out on the right bottom corner of the window.

The following picture shows a snapshot of the main window of ATSH(Deterministic Part):

main-screen.jpg (64052 bytes)

Residual Part View: in order to view this, the analysis file loaded must contain Residual data. The energy value of each of the 25 Critical Bands (in Bark scale) is shown as a color value along frames. If the mouse pointer is on the image, the frame, time, frequency and energy values of its position are printed out on the right bottom corner of the window.

The following picture shows a snapshot of the main window of ATSH(Residual Part):

main-screen-res.jpg (42951 bytes)

 

You may take a look at the file' s header data choosing /view/file header.

header-view.jpg (27102 bytes)

Analyzing

The ATS Analysis routines (by Juan Pampin) were originally developed using the LISP programming language and were ported to the C language by Pete Moss and Juan Pampin.

In order to create a new ATS file, you must analyze a sound file using the /file/New ATS File menu. For a detailed explanation of the analysis parameters consult Appendix I.

Here follows a snapshot of the analysis menu:

analysis.jpg (53701 bytes)

Selecting data

To make any changes, the user must select some data. ATSH performs both, a horizontal (frame) and a vertical (partial) selection. There are at present four ways to select spectral data:

1-Using some presets from the Edit menu.

There are Select All, Unselect All, Select Even, Select Odd, and Invert Selection routines.

2-Clicking with the mouse at the graphic screen.

Block selection: When the left button is pressed, the position of the mouse pointer at the first click represents the first corner of a rectangular selection, and the position of the mouse at the second click the diagonally-opposed corner of it.

Single selection: When the right button is pressed, the partial having the closer frequency at the location of the pointer is selected if it previously was selected, or vice-versa. If a block selection was previously done, and the pointer of the mouse is in the selection rectangle, the other selected partials remains selected. Otherwise, the selection is replaced by the new selected partial for all the extent of the view.

The data selected will be displayed using red color.

The following picture shows three snapshots of the main window of ATSH. The first one shows selected data, the second one is a zoom of the selection shown on the first. The third one shows a non-continuous selection (only the even partial are selected)

main-screen-selection.jpg (73529 bytes)

main-screen-selection-zoom.jpg (82601 bytes)

main-screen-selection-discont.jpg (74304 bytes)

3-Using the List View window (menu View).

In the view menu all the data can be seen under the form of a numerical list. The amplitude, frequency and phase (if any) values of each frame are represented at each page of the list. A vertical selection/deselection can be performed shift-clicking / ctrl-clicking on the list (note that you may also perform a non-contiguous selection of partials as well). The horizontal selection may be done using the NOW=TO and NOW=FROM buttons.

The following snapshot shows the List view Window.

list-view.jpg (62676 bytes)

4-Using the smart selection menu. This menu allows the user to select partials over the lenght of the current view using both amplitude evaluation and/or a fix step of partial order. As an example: setting from = 1, to =10, jump by =2 and Amp. Threshold =-36 will select partials 1, 3, 5, 7 and 9 only if their amplitude (Peak or RMS) is above -36dB.

Here follows a snapshot of the smart selection window:

smart_sel.jpg (27994 bytes)

Editing the selected data

At present there is only two ways to change the selected data: the Edit/Amplitude menu, and the Edit/Frequency menu. This allows the user to handle an amplitude envelope that will be applied to the amplitude or to the frequency values of the partials over the selected time region. The frequency and amplitude values of the partials selected may be may be either scaled (multiplied by the envelope), or offset (have the envelope values added to them).

If the resulting frequency values are greater than the maximal frequency value present in the file, they will be truncated to this value. Also, if the frequency values are changed, the phase information (if any) is suppressed.

There is an "unlimited" Undo choice for the editions. This is done writing and reading the backup file: /tmp/atsh_undo.dat. The PID number is added to the backup filename in order to allow multiple instances of ATSH working simultaneously as well. Beware of editing this file during an ATSH session. If you do so, you may cause the program to crash, and the backup data will be unrecoverable.

Two screenshot of the Edit Frequency Windows are shown below. It can be seen that the functions may be either linear or spline shaped.

edit-linear.jpg (25989 bytes)

edit-spline.jpg (26141 bytes)



Synthesizing

What follows is a screenshot of the Synthesis/Parameters menu.

synthesis-main.jpg (15417 bytes)

In order to synthesize data, an output soundfile name must be entered. At the Set Output File menu the output sound file name can be set as well as it format (WAV, AIFF or SND). The ATSH file I/O functions were done using the Sndlib C language Library (by Bill Schosttaedt, CCRMA, Stanford University).

The user may choose to synthesize the deterministic part, the residual part(if any), or both.

Several features concerning synthesis may be set on the Synthesis/Parameters menu. The user may scale the overall amplitude and frequency of the original data using scalars. Note also that synthesis may use all the data, or just a selection (if any). At present, ATSH's deterministic synthesis engine is implemented as an array of linear interpolating table-lookup oscillators. The residual part is sinthesized by injecting interpolated-noise modulation to each partial according to the energy of the residual found in each Critical Band at the corresponding analysis frame.

It is possible also to use a time function which allows the user to stretch or to expand the  file dinamically as well as read it forward or backwards. The duration of the output file is represented on the X (horizontal) axis while the temporal location of the data of the analysis to be used in the synthesis is represented on the Y (vertical) axis.

The following snapshot shows the time function. The  slope of the line at each segment will produce stretching (pronounced slope second segment of the image),  expansion (non pronounced slope, third segment of the image) or an invariance in time ("normal" slope, first segment of the image).

synthesis-time.jpg (29399 bytes)


Appendix I: ATS(Juan Pampin) Analysis Technique Parameters Explanation. More information on: http://www-ccrma.stanford.edu/~juan/ATS.html

Analysis parameters must be carefully tuned for the Analysis Algorithm (ATSA) to properly capture the nature of the signal to be analyzed. As there are a significant number of them, ATSH offers the possibility of Saving/Loading them in a Binary File carrying the extension "*.apf". The extension is not mandatory, but recommended. A brief explanation of each Analysis Parameters follows:

1-Start (secs.): the starting time of the analysis in seconds.

2-Duration (secs.): the duration time of the analysis in seconds. A zero means the whole duration of the input sound file.

3-Lowest Frequency (Hz.): this parameter will partially determine the size of the Analysis Window to be used. To compute the size of the Analysis Window, the period of the Lowest Frequency in samples (SR / LF) is multiplied by the number of cycles of it the user wants to fit in the Analysis Window (see parameter 6). This value is rounded to the next power of two to determine the size of the FFT for the analysis. The remaining samples are zero-padded. If the signal is a single, harmonic sound, then the value of the Lowest Frequency should be its fundamental frequency or a sub-harmonic of it. If it is not harmonic, then its lowest significant frequency component may be a good starting value.

4-Highest Frequency (Hz.): highest frequency to be taken into account for Peak Detection. Once it is determined that no relevant information is found beyond a certain frequency, the analysis may be faster and more accurate setting the Highest Frequency parameter to that value.

5-Frequency Deviation (Ratio): frequency deviation allowed for each peak in the Peak Continuation Algorithm, as a ratio of the frequency involved. For instance, considering a peak at 440 Hz and a Deviation of .1 will produce that the Peak Continuation Algorithm will only try to find candidates for its continuation between 396 and 484 Hz (10% above and below the frequency of the peak). A small value is likely to produce more trajectories whilst a large value will reduce them, but at the cost of rendering information difficult to be further processed.

6-Number of Cycles of Lowest Frequency to fit in Analysis Window: this will also partially determine the size of the Fourier Analysis Window to be used. See Parameter 3. For single harmonic signals, it is supposed to be more than one (typically 4).

7-Hop Size (Ratio): size of the gap between one Analysis Window and the next expressed as a ratio of the Window Size. For instance, a Hop Size value of .25 will produce an Analysis Window of 2048 samples to "jump" by 512 samples (Windows will overlap for a 75% of their size). This parameter will also determine the size of the analysis frames obtained. Signals that change their spectra very fast (such as Speech sounds) may need a high frame rate in order to properly track their changes.

8-Amplitude Threshold (dB): the highest amplitude value to be taken into account for Peak Detection.

9-Window Type: the shape of the smoothing function to be used for the Fourier Analysis. There are four choices available at present: Blackman, Blackman-Harris, Von Hann, and Hanning. Precise specifications about them are easily found on D.S.P. bibliography.

10-Track Length (Frames): The Peak Continuation Algorithm will "look-back" by Length frames in order to do its job better, preventing frequency trajectories from curving too much and loosing stability. However, a large value for this parameter will slow down the analysis significantly.

11-Minimal Segment Length (Frames): once the analysis is done, the spectral data can be further "cleaned" up during post-processing. Trajectories shorter than this value are suppressed if their average SMR is below Minimal Segment SMR (see parameters 16 and 14). This might help to avoid non-relevant sudden changes while keeping a high frame rate, reducing also the number of intermittent sinusoids during synthesis.

12-Minimal Gap Length (Frames): as parameter 11, this one is also used to clean up the data during post-processing. In this case, gaps (zero amplitude values, i.e. theoretical "silence") longer than Length frames are filled up with amplitude/frequency values obtained by linear interpolation of the adjacent active frames. This parameter prevents sudden interruptions of stable trajectories while keeping a high frame rate.

13-SMR Threshold (dB SPL): also a post-processing parameter, the SMR Threshold is used to eliminate partials with low averages.

14-Minimal Segment SMR (dB SPL): this parameter is used in combination with parameter 11. Short segments with SMR average below this value will be removed during post-processing.

15-Last Peak Contribution (0 to 1): as explained in Parameter 10, the Peak Continuation Algorithm "looks-back" several number of frames to do its job better. This parameter will help to weight the contribution of the first precedent peak over the others. A zero value means that all precedent peaks (to the size of Parameter 10) are equally taken in account.

16-SMR Contribution (0 to 1): In addition to the proximity in frequency of the peaks, the ATS Peak Continuation Algorithm may use psycho-acoustic information (the Signal-to-Mask-Ratio, or SMR) to improve the perceptual results. This parameter indicates how much the SMR information is used during tracking. For instance, a value of .5 makes the Peak Continuation Algorithm to use a 50% of SMR information and a 50% of Frequency Proximity information to decide which is the best candidate to continue a sinusoidal track.