Porting vat to new platforms

(Note: This page is under construction)

Vat is an audio conferencing application developed by the Network Research Group of Lawrence Berkeley National Laboratory. Source code and pre-compiled binaries are available via anonymous ftp.

This page is a brief, incomplete discussion of the vat architecture and some of the problems that may appear when attempting to port it to a new system.

General information on vat can be found on the vat home page.

Vat-related questions and feedback are welcome and can be sent to the developers via vat@ee.lbl.gov.


Contents


Overview

There are three components to vat: the user interface, the network interface and the audio interface. In our experience, almost all the porting problems are associated with the audio interface.


The vat user interface

The user interface for vat is entirely written in Tcl/Tk. We designed it this way because tcl/tk appears to be very portable: there are tcl/tk ports for most Unix systems and for the Mac and Windows-95/Win-NT. Since vat itself contains no window system or GUI code, a running tcl/tk should result in a running vat user interface.

The one GUI piece that might cause problems is the vat sitebox widget. This is a widget we wrote and, like all tk low-level widgets, it contains direct window system calls. These calls all appear in modules sitebox.cc and tkwidget.cc. If there is difficulty porting these to a new system, the tcl user interface files, ui-*.tcl, can simply be changed to use a different site list such as a tk listbox (the sitebox widget is only used from the vat tcl code, not the vat C code).


The vat network model

The network interface uses standard Berkeley sockets. Network I/O appears in two modules: net.cc (and its subclasses like net-ip.cc) and in the `conference bus' communication module, group-ipc.cc. If porting to a network interface substantially different than Berkeley sockets, the changes would probably be confined to these modules. Note however that many parts of the code assume that the network interface returns something that can be passed to Tk_CreateFileHandler() so the appropriate pieces of vat will get call-backs when network data is available. This assumption might cause problems with things like WinSock where a network I/O descriptor is not the same as a device I/O descriptor.


The vat audio model

The core problem for interactive audio conferencing tools is providing a low latency path from the microphone to the net and the net to the speaker. Human factors studies have shown that delays of more than 200-400ms are objectionable because they force people to significantly change their conversational patterns. However, audio is a real-time device and requires an absolutely continuous flow of data -- if the stream of samples bound for the speaker is interrupted, even for a few milliseconds, unpleasant clicks and pops will result. Since the process generating data is subject to pre-emption and arbitrary scheduling delays, audio drivers typically use substantial (1 second or more) system buffers to tide them over while a process is suspended.

These two competing constraints, the need to minimize system buffering to reduce latency and the need to have enough buffer to maintain playout during processing and scheduling delays, are what complicate an interactive audio tool. The problem is to find the smallest amount of buffer that will cover scheduling delays (which are unknown and time-varying since they depend in large part on other load on the system). Vat solves this problem by using audio read completions as a `clock' for audio writes. I.e., audio reads and writes are done in the same units, typically 160 sample (20ms) frames, and as soon each one frame read completes, a one frame write is issued. Since the audio input and output are run off the same crystal timebase, this system is flow-balanced and no backlog can build up between vat and the audio output.

This scheme also self-adjusts to handle scheduling delays. E.g., say vat is pre-empted for 100ms. This means that 100ms of samples will be queued in the system input buffers and, since no reads are completing to cause output, an (unavoidable) 100ms silence gap will appear in the output. When vat is restarted, the 100ms of queued input data will be delivered immediately and, since each input causes an output, a 100ms queue of output data will appear. Since the input is still running continuously, this output queue will persist until the next time vat is pre-empted. Then, as long as the pre-emption is less than 100ms, output can be serviced from the queue and will be continuous. A little thought will show that vat's flow-balance causes the output queue to grow until it just cancels the longest pre-emption time, which is exactly the desired operating point.

Unfortunately this scheme doesn't work if the audio hardware is half-duplex, i.e., if it can't read and write simultaneously. In this case vat runs off a timer rather than audio read completions but the result is much poorer latency control plus some other problems (underruns and slips) that tend to degrade interactive performance. Almost all workstation audio hardware is full duplex but many of the PC audio cards (SoundBlaster, etc.) are half duplex. These should be avoided if possible.


Available audio drivers

Vat audio driver abstraction is found in class Audio (files audio.cc/.h). There are eight different drivers in the current distribution:

Vat is normally linked with af_audio, sock_audio, and one of the vendor/system specific drivers (sun_audio, sgi_audio, etc.).


Processing model and main loop

The general event sequence is:

There are several other methods in the Audio class that have to do with the GUI finding out about the audio device capabilities (number of input and output ports, full or half-duplex, etc.) and controlling those capabilities (switching ports, adjusting mike or speaker gain, etc.). These should be obvious.

(sun_audio.cc or sgi_audio.cc are good examples of the general case full-duplex audio driver. hp-audio.cc is an example of what has to be done if the audio reads are in fixed size units but not the 20ms (160 sample) framesize that vat wants to use. linux-audio.cc and the HDController class in controller.cc are examples of the differences associated with half-duplex devices.)


Supporting different network audio formats -- encoders and decoders

TBA.


The audio hardware sample format and sample rate

TBA - internal representation is 8-bit mu-law and much of the code currently assumes 8000 samples/sec (4KHz audio). Changing either of these is non-trivial & not advisable.


Van Jacobson (van@ee.lbl.gov)
Steven McCanne (mccanne@ee.lbl.gov)