...
Virtual Human Architecture
The Institute for Creative Technologies (ICT) Virtual Human Toolkit is based on the ICT Virtual Human Architecture. This architecture defines at an abstract level what modules are needed to realize a virtual human, and how these modules interact. The basic functionality of each module as well as its interface is well-defined, but its actual implementation falls outside the scope of the architecture. The architecture dictates the implementation of a distributed system where communication is mostly realized using message passing. This allows for multiple implementations of a certain module and simple substitution of one implementation for another during runtimerun-time. It also allows for distributed systems, where different modules run on separate computers.
ICT has developed a general framework, consisting of libraries, tools and methods that serve to support relatively rapid development of new modules. Using this framework, ICT and its partners have developed a variety of modules, both in the context of basic research as well as more applied projects. The toolkit Toolkit provides some of the modules that have been transitioned from this research.
Please see below for a high level Virtual Human Architecture.
Figure 1. The ICT Virtual Human Architecture.
...
Virtual Human Toolkit Implementation
The Virtual Human Toolkit is a set of components (modules, tools and libraries) that implements one possible version instantiation of the Virtual Human Architecture. It consists of has the following main modules:
- vhtoolkitUnity
- Non-Verbal Behavior Generator
- NPCEditor
- Ogre
- SmartBody
- Speech Client
- Text To Speech Interface
- Watson
- /wiki/spaces/AG/pages/11963743
- MultiSense
For a complete overview of all the modules, tools and libraries, please see the Components section.
The figure below shows how all modules interact along with the messages used to communicate with each other.
Figure 2. The Virtual Human Toolkit architecture.
Virtual Human Messaging
Overview
Communication between most modules happens by message passing, which is implemented in ActiveMQ. Messages are broadcasted through the system and a component can subscribe to certain messages. Developers can use the VH Messaging library to quickly implement receiving and sending ActiveMQ messages. Note that ActiveMQ was preceded by Elvin and that due to legacy reasons one might still encounter mentionings of Elvin, even though no Elvin code is used anymore.
Every module has its own messaging API. In addition, each module should at least implement the basic messaging protocol that allows modules to request and share whether they are online:
- vrAllCall, pings all modules.
- vrComponent component-id sub, response by module to 'vrAllCall'; should also be send on start-up by module.
- vrKillComponent {component-id,'all'}, requests a single module or all modules to shut down.
- vrProcEnd component-id sub, send by module on exit.
Below follows a detailed description of each message. Note that this section is currently incomplete.
RemoteSpeechCmd
RemoteSpeechCmd
Description
Sent by SmartBody to Text-To-Speech relay. Requests for a certain text and voice to create speech.
RemoteSpeechReply
RemoteSpeechReply
Description
Reply from Text-To-Speech relay (to SmartBody) with viseme and word timing information.
vrAllCall
vrAllCall string
Description
The message vrAllCall is a ping-style request for all components to reannounce their availability, often used when a new component needs to identify available services. This is used in particular by the launcher to see which components are online. Developers can use an optional parameter to indicate why the request was send out.
The components that are available will announce their availability with the vrComponent message.
Parameters
No parameters are used.
Examples
- vrAllCall
Components that send message
All components can send this message. The launcher sends this message on a regular interval in order to keep track of the status of all modules and tools that are known to have been launched.
Components that receive message
All components except libraries should implement listening to this message and send a vrComponent message on receiving it.
Related messages
vrComponent
vrComponent component-id sub
Description
The vrComponent message announces the availability of the given module to all other components. This message should be send at start-up and in response to vrAllCall.
Parameters
- component-id, contains the ID of the particular component. This can be a module type, like 'renderer' or 'nlu', or a specific module, like 'npceditor'.
- sub, is required, but not strictly defined. One can use it to specify the actual implementation of the module type, for instance 'ogre' for the renderer, to specify a subcomponent, for instance 'parser' for the Non-Verbal Behavior Generator, or simply use 'all' when no additional information is useful to provide.
Examples
- vrComponent renderer ogre
- vrComponent nvb generator
- vrComponent launcher all
Components that send message
All components except libraries should send this message on start-up. In addition, each component except libraries should send it in response to vrAllCall.
Components that receive message
All components except libraries should implement listening to this message.
Related messages
vrExpress
- vrExpress char-id addressee-id utterance-id xml-messsage
Description
Sent from the NPCEditor (or any other module requesting behavior) to the Nonverbal Behavior Generator. See the logs when running Brad for examples. This message will be better documented in future iterations.
vrKillComponent
vrKillComponent component-id
vrKillComponent all
Description
Requests either a specific or all modules to shut themselves down. Each module and tool should listen to this message and when the parameter matches its component ID or 'all', it should shut itself down, after sending vrProcEnd. Note that, unlike vrComponent, no second parameter is present, therefore all submodules should exit when receiving the kill request.
Parameters
- component-id, contains the ID of the particular component. This can be a module type, like 'renderer' or 'nlu', or a specific module, like 'npceditor'.
Examples
- vrKillComponent renderer
- vrKillComponent nvb
- vrKillComponent all
Components that send message
This message will usually only be send by the launcher.
Components that receive message
All components except libraries should implement listening to this message.
Related messages
vrProcEnd
vrProcEnd component-id sub
Description
This message is send by a component, indicating to the rest of the system that it has exited and the service is no longer available. This message should be send by all components, except libraries.
Parameters
- component-id, contains the ID of the particular component. This can be a module type, like 'renderer' or 'nlu', or a specific module, like 'npceditor'.
- sub, is required, but not strictly defined. One can use it to specify the actual implementation of the module type, for instance 'ogre' for the renderer, to specify a subcomponent, for instance 'parser' for the Non-Verbal Behavior Generator, or simply use 'all' when no additional information is useful to provide.
Examples
- vrProcEnd renderer ogre
- vrProcEnd nvb generator
Components that send message
All components except libraries should send this message on exit.
Components that receive message
Modules that need to be aware of available services should listen to this message.
Related messages
vrSpeak
- vrSpeak char-id addressee-id utterance-id xml-messsage
Description
Sent from the Nonverbal Behavior Generator to SmartBody. Requests for SmartBody to realize verbal and nonverbal behavior. See the logs when running Brad for examples. This message will be better documented in future iterations.
vrSpeech
- vrSpeech type turn-id other
Description
vrSpeech messages go together as a sequence with the same 'turn-id', representing the speech recognition, timing, and possible segmentation of a human utterance. 'vrSpeech start' means the person has started to speak - indicated currently by pressing on a mouse button. 'vrSpeech finished-speaking' means the person has stopped speaking for this turn, indicated currently by releasing the mouse button. 'vrSpeech interp' indicates the text that the person said during this speech. 'vrSpeech asr-complete' means that speech recognition and segmentation are finished for this turn, and other modules are free to start using the results without expecting more utterances for this turn. 'vrSpeech partial' is an optional message, giving partial results during speech.
- vrSpeech start $turn_id $speaker
- vrSpeech finished-speaking $turn_id
- vrSpeech partial $turn_id $utt_id $confidence $tone $speech
- vrSpeech interp $turn_id $utt_id $confidence $tone $speech
- vrSpeech asr-complete $turn_id
Parameters
- 'type', indicates the type of vrSpeech message, being 'start', 'finished-speaking', 'partial', 'interp', and 'asr-complete'.
- 'turn_id is a unique identifier for this whole turn. Must be the same across all messages in a sequence.
- 'speaker' is the speaker of this message, like 'user' in the Brad example.
- 'utt_id', the unique identifier for this utterances. A turn can be segmented into multiple utterances. Historically this is always '1', and there is only one 'vrSpeech interp' message per turn. The current convention is that the speech recognizer / client will send out a message with '$utt_id 0', and then a possible segmenter will send out one or more messages from 1 to n, labelled in order. A segmenter is currently not part of the toolkit. For 'vrSpeech partial' $utt_id is an incremental interpretation of the part of the utterances that has been processed so far. It will likely have a value of p followed by an integer but any unique identifier should be ok.
- 'confidence' - the confidence value for this interpretation. Should be between 0.0 and 1.0. Historically this is always 1.0.
- 'tone' - the prosody - should be "high" "normal" or "low" - historically, this is always "normal".
- 'speech' - the text of the (segment) interpretation. For 'utt_id 0' this will be the text of the full turn, for other segments this will be the text only for that segment.
Examples
This example indicates a speaker with the name 'user' starting to speak, saying 'hello how are you', and then stopping. Note that the speech is only analyzed / completed after the user has stopped speaking. Also note that all listed messages are needed to process a single utterance and notify the rest of the system of the results.
- vrSpeech start test1 user
- vrSpeech finished-speaking test1 user
- vrSpeech interp test1 1 1.0 normal hello how are you
- vrSpeech asr-complete test1
Components that send message
The speech client or speech recognizer.
Components that receive message
...
- NPCEditor, a statistical text classifier which matches novel input (a question the user asks) to authored output (a line the characters speak).
- SmartBody (SB), a character animation platform that provides locomotion, steering, object manipulation, lip syncing, gazing and nonverbal behavior in real time through the Behavior Markup Language (BML).
- NonVerbal Behavior Generator (NVBG), a rule based system that takes a character utterance as input and a nonverbal behavior schedule (gestures, head nods, etc.) in the form of BML as output.
- MultiSense, a perception framework that enables multiple sensing and understanding modules to inter-operate simultaneously, broadcasting data through the Perception Markup Language (PML).
- Unity, a proprietary game engine. The Toolkit only contains the executable, but you can download the free version of Unity or purchase Unity Pro from their website. The Toolkit includes Ogre as an open source example on how to integrate SmartBody with a renderer.
- PocketSphinx, an open source speech recognition engine. In the Toolkit, PocketSphinx is the speech server for our AcquireSpeech client.
- Text-to-speech engines, including Festival and MS SAPI.
For a complete overview of all the modules, tools and libraries, please see the Components section.
The figure below shows a high level overview of the Toolkit modules. Instead of a full-fledged agent, the NPCEditor is handling all verbal in- and outputs. Note that since MultiSense (perception) is currently included as a basic proof-of-concept, it is directly communicating to SmartBody. A deeper integration with the system would require MultiSense to communicate with the NPCEditor, a dialogue manager and/or the NVBG instead.
A more detailed overview is depicted below. The bold arrows indicate a direct link between modules, either TCP/IP or included DLL. All other links show what messages are passing between modules; see Virtual Human Messaging for more details.
Data Models
Virtual Human Toolkit:
Individual modules: