Barrier Free Top Navigation Bar Project Overview Contact Us Site Map Search Partners Updates Home

Click the Captions, Select the Descriptions: Making Captioning and Video Description Essential for Any Learner in Broadband Education

Jutta Treviranus
Adaptive Technology Resource Centre
University of Toronto


The goals of equal access and on-line interactive learning are converging. The shifting landscape in delivery of education by broadcasters provides an opportunity to establish new conventions and paradigms. This paper describes a project that uses enhanced access tools to provide interactivity and personalization of learning materials for learners with or without disabilities. It is hoped that access tools will become an essential and not special component of all broadband learning environments.

[Return to TOP]

The Context

Television and the Web are converging. For educational TV broadcasters this opens the possibility for new learning paradigms. Access tools such as captioning and video description provide an ideal vehicle for interactivity as well as a method of accommodating a large range of learning styles and levels. The Adaptive Technology Resource Centre in collaboration with Canadian Learning Television and eight partner organizations is developing tools that will establish captioning and video description as essential components of interactive educational video over broadband for all learners.

Combining the Old with the New

Educational broadcasters have predominantly used a lecturer paradigm, augmented with audio-visual demonstrations. This paradigm was well suited to the unidirectional nature of broadcasting. When done well, this approach combined entertainment, story-telling and theatre to engage the learner. Unfortunately, due to its linear and inflexible nature there are many instances where learning breaks down when using this approach. This breakdown can be attributed to communication problems: "I didn't hear an important concept, I missed part of an argument, I don't understand a word or term used." It can be due to a mismatch in the assumptions regarding the prior knowledge of the learner: "What do you mean by terminal velocity." It can be due to a mismatch in learning styles: "Can you show me that rather than describing it. Can you illustrate it in another way." It can be due to a mismatch in pace: "You're going too fast, you've lost me. You're going too slow, I'm bored." Whatever the cause of the breakdown, once the learner is disengaged from the process it is very difficult to pick up the threads again to achieve the learning goals.

The quality and quantity of learning material produced by educational broadcasters is very impressive. At a time when we are searching for content for on-line learning, educational videos offer a rich store of resources if we can successfully adapt and re-purpose them in an on-line environment.

The overall goal of the project entitled "Creating Barrier-free Broadband Learning Environments," is to identify potential barriers to access in broadband education delivery systems for learners with disabilities, develop solutions to the barriers, advance alternative or multi-modal display and control mechanisms that are only possible in broadband environments and create tools that allow learners to customize the learning experience to their individual learning styles and needs. In meeting these goals the project will also develop a means of creatively re-purposing quality traditional educational programming while addressing the problems that cause breakdown in learning.

[Return to TOP]

The Essential Role of Captioning

A general objective of the project is to adapt the material to meet the needs of the broadest range of learners, both from an equal access perspective and from a knowledge level perspective (e.g., making college level physics accessible to grade 9 students); and to make it highly interactive and responsive to the specific needs of the learner. Captioning plays an essential role in this objective. The verbatim captions are used for several purposes. Captioning is used to structure and markup the video. This structure is then used to navigate within the video and to condense or expand the material. For example, if a learner wants to go back to every mention of a specific term, the caption would be used to sort the start time codes for the segments.

The traditional "Line 21" captioning is replaced with enhanced multimedia overlay captions. Standard captions are limited to text and restricted to either the bottom or top of the screen. This makes it difficult to communicate the source of the sound or speech, it also makes it very difficult to communicate non-speech sound-based information such as inflection, tone of voice, speech rate, music, and other non-speech sounds. By allowing multimedia overlay captioning, comic book conventions can be adopted to indicate the source of sound. Color, animation and graphics can be used to indicate affect, music and non-speech sounds. A video window can be invoked to provide ASL/LSQ translation. Captions can also be used to label visual objects or highlight a part of the video frame.

Most importantly the captions will be used for hyper-linking. Thus if the learner wants more information about a term, a definition, background material, related material or an interactive exercise that further illustrates a concept, they would click on the term or phrase in the caption which would pause the video and take them to the supportive material.

Tell Me More with Video Descriptions

Video description, beyond making the video accessible to learners who are blind, will be used to elucidate, provide further detail or clarify. Thus for a learner having difficulty in following the steps in a chemistry experiment, the video description would elucidate the steps in greater detail than provided by the original video.

Putting these pieces together you can imagine the following scenario: "Watching a physics lecture by an eminent physicist, a phenomenon is referred to that you know little about, you turn on captioning and click on the term in the text caption, this links you to a definition of the term. To find out more about the phenomenon referred to you click on an interactive exercise that illustrates the concept. To better understand the forces at play you turn on haptic rendering and use your force feedback joystick to feel what is happening. Once you are confident that you understand the term you return to the lecture. The lecturer moves to a demonstration, some of which you find difficult to follow, you turn on descriptive video which provides a subnarrative in the audio pauses further describing what is happening. For additional help you turn on overlay captions that provide text labels of the objects and processes occurring in the demonstration. You control the interface using a simple set of voice commands."

[Return to TOP]

Education that Fits the Learner

One important step in preventing the breakdown of learning is to insure that the education fits the learner. The learner should be able to customize the amount and type of background given, the detail or verbosity of the dialogue, the reading level required to follow the material, the pace of the teaching/learning, the methods used to demonstrate concepts, and the learning outcomes to be achieved (e.g., the general theory versus applying the actual algorithm). A properly structured broadband learning environment should allow this customization.

XML Schema and Practicing what we Preach

To allow customization of the learning material the ATRC and its partners are creating XML schema for Captioning and Video Description. The schema will be partially based upon the DAISY standard. The XML schema combined with XSLT style sheets will allow user specification of how captions are displayed (e.g., font, color, position, hyperlink, etc.). A Meta-tagging scheme (based upon international learning object meta-tagging standards) will be developed to allow the storage and retrieval of caption tracks for different reading levels, languages, levels of verbosity, etc. A similar classification will be possible for the video description audio tracks. Thus a learner can take a specific lesson, choose a caption track suited to their reading level, displayed the way they want it, with the desired level of description in both the captioning and video description, and the desired amount and type of background material in-line and linked to the lesson. Alternatively, a teacher can re-purpose a lesson for the needs of a specific class, allowing further individual customization for specific students.

The Necessary Tools

To make the above described scenario possible the ATRC and its partners are developing a number of tools. These include the authoring tools needed to create the enhanced captioning and video description, the browser/viewer required to specify user preferences and view the enhanced materials and the learning repository needed to store and retrieve the associated learning objects.

The enhanced caption and video description authoring and mark-up tools will be created as modular components to be added to existing video authoring tools for the web. The files created will be SMIL compliant. Initially, Quicktime sprites will be used to create the more advanced interactive components. It is hoped that Magpie, developed by NCAM can act as the base for the caption authoring tool. To make the authoring process realistic for the typical educator a number of intelligent preprocessing and script or text track management tools will be included.

To allow the expression of user preferences and to allow the assembly of learning objects on the fly based upon the user preferences a browser/viewer will be created using Mozilla and XUL. Thus the learner can adapt the browser interface, the type and order of learning objects and how they are displayed. The captions and video descriptions would act as anchors to related resources, interactive exercises, and other learning resources.

It is hoped that the functionality required of the learning repository can be integrated into national learning repositories presently under development. The project will model the functionality for the purposes of the project and advocate for its inclusion with national and international groups governing large learning repositories. Necessary functionality includes the facility to create a number of assembly objects that can call collections of atomic objects (caption tracks, URLs, audio description tracks).

Educators and producers of learning objects would collectively contribute to the learning repository, thereby reducing the amount of development required for a specific learning module. Once the objects in the repository have reached a critical mass, the objects can be reused for several learning modules.

[Return to TOP]


The principles of both equal access and successful learning are to allow a broad and flexible range of display, control and interaction techniques. The terminology may have differed in the two fields but the functionality is the same. The field of equal access has had much more experience in implementing these principles in a technical environment. In this time of converging technologies and shifting paradigms, we have much to teach educators. If done correctly, access tools can become an essential and not a special component of broadband learning environments.


Further information about the project and related material can be found at the project web site:


The project is partially funded by Canarie Inc.

The author would like to acknowledge all project partners and staff. For a complete list please refer to:

[Return to TOP]

Web site hosted by the Adaptive Technology Resource Centre Adaptive Technology Resource Centre Logo