Our solution is to utilize a layered source-coder in tandem with a layered transmission system. The layered source-coder produces an embedded bit stream that can be decomposed into arbitrary number of hierarchical flows. These flows are distributed across multiple multicast groups, allowing receivers to ``tune in'' to some subset of the flows [1]. Each layer provides a progressively higher quality signal.
We utilize only mechanisms that presently exist in the Internet and leverage off a mechanism called ``pruning'' in IP Multicast. IP Multicast works by constructing a simplex distribution tree from each source subnet. A source transmits a packet with an ``IP group address'' destination and the multicast routers forward a copy of this packet along each link in the distribution tree. When a destination subnet has no active receivers for a certain group address, the last-hop router sends a ``prune'' message back up the distribution tree, which prevents intermediate routers from forwarding the unnecessary traffic. Thus, each user can locally adapt to network capacity by adjusting the number of multicast groups --- i.e., the number of compression layers --- that they receive.
Our source-coder is a simple, low-complexity, wavelet-based algorithm that can be run in software on standard workstations and PC's (see [7] for a comprehensive treatment of wavelets applied to subband coding). Conceptually, the algorithm works by conditionally replenishing wavelet transform coefficients of each frame. By using a wavelet decomposition with fairly short basis functions, we can optimize the coder by carrying out conditional replenishment in the pixel domain and then transforming and coding only those blocks that need to be updated. The wavelet coefficients are bit-plane coded using a representation similar to the well-known zero-tree decomposition [2]. All zero-tree sets are computed in parallel with a single bottom up traversal of the coefficient quad-tree, and all layers of the bit-stream are computed in parallel using a table-driven approach.
Using vic's software architecture and codec implementations as a foundation, we have designed a new tool for video bandwidth adaptation using efficient transcoding between video formats [3]. By placing an application-level gateway near the entrance to a low bandwidth environment, high-rate video can be adapted for the lower bandwidth network through transcoding.
We have implemented our gateway design in an application called vgw and used it to transcode high-quality JPEG video from a UCB Seminar to low-rate H.261 video for the MBone. The JPEG to H.261 conversion process uses a highly optimized algorithm that avoids DCT computations by manipulating data entirely within the transform domain.
Elan Amir has continued work on the gateway architecture as part of his Masters thesis, evolving our prototype into a fully-functional application and integrating it into other environments such as InfoPad and GloMop.
We're currently working on a floor control tool, where the media applications are coordinated over the Conference Bus via remote commands from the floor control tool. All of the LBL MBone tools have the ability to ``mute'' or ignore a network media source, and the disposition of this mute control can be controlled via the Conference Bus, which the floor control tool can use to effect a moderation policy. One possible model is that each participant in the session follows the direction of a well-known (session-defined) moderator. The moderator can give the floor to a participant by multicasting a takes-floor directive with that participant's RTP CNAME. Locally, each receiver then mutes all participants except the one that holds the floor.
Cross-media synchronization can also be carried out over the Conference Bus. Each real-time application induces a buffering delay, called the playback point, to adapt to packet delay variations [4]. This playback point can be adjusted to synchronize across media. By broadcasting ``synchronize'' messages across the Conference Bus, the different media can compute the maximum of all advertised playout delays. This maximum is then used in the delay-adaptation algorithm.
We've also used the Conference Bus to implement remote manipulation of graphical overlays in a vic video stream. This allows a ``title-generator'' to be easily prototyped as separate tool from vic.
Elan Amir has extended vgw with a Conference Bus interface, using a split implementation where the transcoding engine is a bare application with a tcl/tk user-interface that configures the engine over the Conference Bus. By using the Conference Bus, vgw automatically inherits the flexibility of our existing conference coordination framework. For example, vgw can be configured to dedicate extra bandwidth to a video stream with an active speaker in the same way vic can be configured to switch windows to the current speaker.
A large component of the decoding CPU budget goes into rendering frames, whether copying bits to the frame buffer, performing color space conversion, or carrying out a dither. Accordingly, the vic rendering modules were designed so that their load can be adapted by running the rendering process at a rate lower than the incoming frame rate. Thus, a 10 f/s H.261 stream can be rendered, say, at 4 f/s if necessary. Alternatively, the decoding process itself can be scaled (if possible). For example, our prototype layered coder can run faster by processing a reduced number layers at the cost of lower image quality.
While the hooks for scaling the decoder process are in place, the control algorithm isn't. We are currently working on approaches to measuring load and the control algorithm for updating the load-scaling parameters.