Porting vat to new platforms

(Note: This page is under construction)

Vat is an audio conferencing application developed by the Network Research Group of Lawrence Berkeley National Laboratory. Source code and pre-compiled binaries are available via anonymous ftp.

This page is a brief, incomplete discussion of the vat architecture and some of the problems that may appear when attempting to port it to a new system.

General information on vat can be found on the vat home page.

Vat-related questions and feedback are welcome and can be sent to the developers via vat@ee.lbl.gov.

Overview
The vat user interface
The vat network model
The vat audio model
Available audio drivers
Processing model and main loop
Support for different network audio formats -- encoders and decoders
Audio hardware sample format and sample rate

Overview

There are three components to vat: the user interface, the network interface and the audio interface. In our experience, almost all the porting problems are associated with the audio interface.

The user interface for vat is entirely written in Tcl/Tk. We designed it this way because tcl/tk appears to be very portable: there are tcl/tk ports for most Unix systems and for the Mac and Windows-95/Win-NT. Since vat itself contains no window system or GUI code, a running tcl/tk should result in a running vat user interface.

The one GUI piece that might cause problems is the vat sitebox widget. This is a widget we wrote and, like all tk low-level widgets, it contains direct window system calls. These calls all appear in modules sitebox.cc and tkwidget.cc. If there is difficulty porting these to a new system, the tcl user interface files, ui-*.tcl, can simply be changed to use a different site list such as a tk listbox (the sitebox widget is only used from the vat tcl code, not the vat C code).

The vat network model

The network interface uses standard Berkeley sockets. Network I/O appears in two modules: net.cc (and its subclasses like net-ip.cc) and in the `conference bus' communication module, group-ipc.cc. If porting to a network interface substantially different than Berkeley sockets, the changes would probably be confined to these modules. Note however that many parts of the code assume that the network interface returns something that can be passed to Tk_CreateFileHandler() so the appropriate pieces of vat will get call-backs when network data is available. This assumption might cause problems with things like WinSock where a network I/O descriptor is not the same as a device I/O descriptor.

The vat audio model

The core problem for interactive audio conferencing tools is providing a low latency path from the microphone to the net and the net to the speaker. Human factors studies have shown that delays of more than 200-400ms are objectionable because they force people to significantly change their conversational patterns. However, audio is a real-time device and requires an absolutely continuous flow of data -- if the stream of samples bound for the speaker is interrupted, even for a few milliseconds, unpleasant clicks and pops will result. Since the process generating data is subject to pre-emption and arbitrary scheduling delays, audio drivers typically use substantial (1 second or more) system buffers to tide them over while a process is suspended.

These two competing constraints, the need to minimize system buffering to reduce latency and the need to have enough buffer to maintain playout during processing and scheduling delays, are what complicate an interactive audio tool. The problem is to find the smallest amount of buffer that will cover scheduling delays (which are unknown and time-varying since they depend in large part on other load on the system). Vat solves this problem by using audio read completions as a `clock' for audio writes. I.e., audio reads and writes are done in the same units, typically 160 sample (20ms) frames, and as soon each one frame read completes, a one frame write is issued. Since the audio input and output are run off the same crystal timebase, this system is flow-balanced and no backlog can build up between vat and the audio output.

This scheme also self-adjusts to handle scheduling delays. E.g., say vat is pre-empted for 100ms. This means that 100ms of samples will be queued in the system input buffers and, since no reads are completing to cause output, an (unavoidable) 100ms silence gap will appear in the output. When vat is restarted, the 100ms of queued input data will be delivered immediately and, since each input causes an output, a 100ms queue of output data will appear. Since the input is still running continuously, this output queue will persist until the next time vat is pre-empted. Then, as long as the pre-emption is less than 100ms, output can be serviced from the queue and will be continuous. A little thought will show that vat's flow-balance causes the output queue to grow until it just cancels the longest pre-emption time, which is exactly the desired operating point.

Unfortunately this scheme doesn't work if the audio hardware is half-duplex, i.e., if it can't read and write simultaneously. In this case vat runs off a timer rather than audio read completions but the result is much poorer latency control plus some other problems (underruns and slips) that tend to degrade interactive performance. Almost all workstation audio hardware is full duplex but many of the PC audio cards (SoundBlaster, etc.) are half duplex. These should be avoided if possible.

Available audio drivers

Vat audio driver abstraction is found in class Audio (files audio.cc/.h). There are eight different drivers in the current distribution:

bsd_audio supports the public domain BSD audio driver for sparcs running SunOS 4.1.x or Sparc BSD.
pc_audio supports the Soundblaster variant of the BSD audio driver running under BSD/386.
sun_audio, sgi_audio, and hp_audio support the standard system audio interface on Suns, SGIs and HPs, respectively.
linux_audio supports the Linux VoxWare audio driver (although linux_audio is currently missing support for many of VoxWare's capabilities).
af_audio supports the DEC CRL AudioFile audio system. This is the most complicated of the audio drivers because AudioFile is a network-based audio system. I.e., there is a potentially high-latency network path between vat and the AF server process and vat attempts to estimate and control the latency in this path which requires doing more complex things than the simple flow-balance that works for directly accessible audio hardware.
sock_audio is a simple stub to allow vat to send and receive audio from a unix-domain socket. It has been used for such things as interfacing to audio DSPs and ISDN telephones. There is a sample application available that demonstrates how to use this interface.

Vat is normally linked with af_audio, sock_audio, and one of the vendor/system specific drivers (sun_audio, sgi_audio, etc.).

Processing model and main loop

The general event sequence is:

When there is audio data available to read, the tk event handler makes a callback to Audio::dispatch().
Audio::dispatch() calls the virtual FrameReady() to see if there is a full frame (160 samples) of input data available. If the system's audio driver always returns full frames (e.g., bsd_audio, pc_audio), FrameReady() can simply return 1. But in the usual case the system will signal data available for an arbitrary amount of data so FrameReady() must read the data into a local buffer and only return 1 when it has a full frame.
When FrameReady() says there is a full frame available, Audio::dispatch() will call Controller::audio_handle() to consume and process it.
The Controller object is the core of vat -- it handles all the data flow between the audio and the net and is the focus for most of the GUI user actions. Controller::audio_handle() calls the Audio Read() virtual to get the new frame, then immediately calls the Audio Write() virtual to write a frame's worth of audio data from the net to the audio speaker, then does a bunch of processing associated with silence suppression and the various speakerphone modes that may or may not result in the incoming frame being encoded, encapsulated and sent to the net.

There are several other methods in the Audio class that have to do with the GUI finding out about the audio device capabilities (number of input and output ports, full or half-duplex, etc.) and controlling those capabilities (switching ports, adjusting mike or speaker gain, etc.). These should be obvious.

(sun_audio.cc or sgi_audio.cc are good examples of the general case full-duplex audio driver. hp-audio.cc is an example of what has to be done if the audio reads are in fixed size units but not the 20ms (160 sample) framesize that vat wants to use. linux-audio.cc and the HDController class in controller.cc are examples of the differences associated with half-duplex devices.)

Supporting different network audio formats -- encoders and decoders

TBA.

The audio hardware sample format and sample rate

TBA - internal representation is 8-bit mu-law and much of the code currently assumes 8000 samples/sec (4KHz audio). Changing either of these is non-trivial & not advisable.

Van Jacobson (van@ee.lbl.gov)
Steven McCanne (mccanne@ee.lbl.gov)

Porting vat to new platforms

(Note: This page is under construction)

Contents