Page Comparison

Table of Contents

maxLevel	3

Overview

Communication between most modules happens by message passing, which is implemented in ActiveMQ. Messages are broadcasted through the system and a component can subscribe to certain messages. Developers can use the VH Messaging library to (VHMsg) to quickly implement receiving and sending ActiveMQ messages . Note that ActiveMQ was preceded by Elvin and that due to legacy reasons one might still encounter mentionings of Elvin, even though no Elvin code is used anymore.in a variety of languages, including C++, C# and Java.

Every module has its own messaging API. In addition, each module should at least implement the basic messaging protocol that allows modules to request and share whether they are online:

vrAllCall [reason], pings all modules.
vrComponent component-id sub, response by module to 'vrAllCall'; should also be send sent on start-up by module.
vrKillComponent {component-id,'|all'}, requests a single module or all modules to shut down.
vrProcEnd component-id sub, send sent by module on exit.

Below follows a detailed description of each message. Note that this section is currently incomplete.

RemoteSpeechCmd

Description

Sent by SmartBody to Text-To-Speech relay. Requests for a certain text and voice to create speech.

RemoteSpeechReply

Description

Reply from Text-To-Speech relay (to SmartBody) with viseme and word timing information.

vrExpress

vrExpress char-id addressee-id utterance-id xml-messsage

This message is sent to NVBG by the NLU, NPCEditor or similar module. This message can be used to convey information about speech data, posture, status change, emotion change, gaze data etc. as shown below.

Speech

The speech messages are characterized by the speech tag within them. They are interpreted and the corresponding output bml is generated with the speech time marks, animations, head-nods, facial-movements etc. These animations are generated based on the content of the speech tag and the fml tag in the input message.

...

Posture Change

These messages are characterized by the <body posture=""> tag which allows NVBG to know that there has been a change in posture.

vrExpress "harmony" "None" "??" "<?xml version="1.0" encoding="UTF-8" standalone="no" ?><act>
<participant id="harmony" role="actor" />
<bml>
<body posture="HandsAtSide" />
</bml>
</act>"

Status / Request

The idle_behavior and all_behavior attributes within the request tag allows NVBG to keep track of whether or not to generate the corresponding behavior.

vrExpress "harmony" "None" "??" "<?xml version="1.0" encoding="UTF-8" standalone="no" ?><act>
<participant id="harmony" role="actor" />
<fml>
<status type="present" />
<request type="idle_behavior" value="off" />
</fml>
</act>"

Gaze

These gaze tags, if present within the input message are transferred unaltered to the output message.

vrExpress "harmony" "ranger" "constant103" "<?xml version="1.0" encoding="UTF-8" standalone="no" ?><act>
<participant id="harmony" role="actor" />
<fml>
<gaze type="weak-focus" target="ranger" track="1" speed="normal" > "listen_to_speaker" </gaze>
</fml>
</act>"

Emotion

The affect tag contains data about the emotional state the character is currently in. This can be used to affect output behavior.

vrExpress "harmony" "None" "schererharmony17" "<?xml version="1.0" encoding="UTF-8" standalone="no" ?><act>
<participant id="harmony" role="actor" />
<fml>
<affect type="Fear" STANCE="LEAKED" intensity="110.475"></affect>
</fml>
<bml> </bml>
</act>"

Description

Sent from the NPCEditor (or any other module requesting behavior) to the Nonverbal Behavior Generator. See the logs when running Brad for examples. This message will be better documented in future iterations.

vrSpeak

vrSpeak char-id addressee-id utterance-id xml-messsage

Below is an example of the vrSpeak message that is output from NVBG. It contains animation/head-nods/facial behavior based on the text that is to be spoken/heard.

vrSpeak brad user 1337363228078-9-1 <?xml version="1.0" encoding="utf-16"?><act>
<participant id="brad" role="actor" />
<bml>
<speech id="sp1" ref="voice_defaultTTS" type="application/ssml+xml">
Depending
on
the
default
voice
you
have
selected
in
Windows,
I
can
sound
pretty
bad.

</speech>
<event message="vrAgentSpeech partial 1337363228078-9-1 T1 Depending " stroke="sp1:T1" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T3 Depending on " stroke="sp1:T3" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T5 Depending on the " stroke="sp1:T5" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T7 Depending on the default " stroke="sp1:T7" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T9 Depending on the default voice " stroke="sp1:T9" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T11 Depending on the default voice you " stroke="sp1:T11" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T13 Depending on the default voice you have " stroke="sp1:T13" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T15 Depending on the default voice you have selected " stroke="sp1:T15" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T17 Depending on the default voice you have selected in " stroke="sp1:T17" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T19 Depending on the default voice you have selected in Windows, " stroke="sp1:T19" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T21 Depending on the default voice you have selected in Windows, I " stroke="sp1:T21" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T23 Depending on the default voice you have selected in Windows, I can " stroke="sp1:T23" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T25 Depending on the default voice you have selected in Windows, I can sound " stroke="sp1:T25" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T27 Depending on the default voice you have selected in Windows, I can sound pretty " stroke="sp1:T27" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T29 Depending on the default voice you have selected in Windows, I can sound pretty bad. " stroke="sp1:T29" />
<gaze target="user" direction="POLAR 0" angle="0" sbm:joint-range="HEAD EYES" xmlns:sbm="http://ict.usc.edu" />
<sbm:event message="vrSpoke brad user 1337363228078-9-1 Depending on the default voice you have selected in Windows, I can sound pretty bad." stroke="sp1:relax" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns:sbm="http://ict.usc.edu" />

<animation stroke="sp1:T3" priority="5" name="HandsAtSide_Arms_GestureWhy" />

<head type="NOD" amount="0.10" repeats="1.0" relax="sp1:T5" priority="5" />

<head type="NOD" amount="0.10" repeats="1.0" relax="sp1:T11" priority="5" />

<head type="NOD" amount="0.10" repeats="1.0" relax="sp1:T19" priority="5" />

<head type="NOD" amount="0.10" repeats="1.0" relax="sp1:T21" priority="5" />
</bml>
</act>

Explanation:

The Time-Markers i.e. T0, T1, T2 and so on, are used by the NVBG to mark each individual word in the sentence and later it can use this information to play animations at the start/end of that word.

e.g. in the above output bml, the animation "HandsAtSide_Arms_GestureWhy" is timed to play at sp1:T3, which means it will be played when the word "on" has been spoken. 'sp1' is the id of the speech tag which indicates that this 'T3' is from this particular speech tag. (currently NVBG only supports one speeech tag).

The appropriate "vrAgentSpeech partial" event messages are sent out by smartbody after each word has been spoken by the character. e.g."vrAgentSpeech partial 1337363228078-9-1 T19 Depending on the default voice you have selected in Windows," is sent out after the word Windows is spoken.

Similarly the <head> tag is interpreted by smartbody to generate Nods and other specified behavior.

The "priority" attribute allows NVBG to decide which behavior is to be culled if any of the behaviors overlap. This is done before the final message is sent out.

Description

Sent from the Nonverbal Behavior Generator to SmartBody. Requests for SmartBody to realize verbal and nonverbal behavior. See the logs when running Brad for examples. This message will be better documented in future iterations.

vrSpeech

vrSpeech type turn-id other

Description

vrSpeech messages go together as a sequence with the same 'turn-id', representing the speech recognition, timing, and possible segmentation of a human utterance. 'vrSpeech start' means the person has started to speak - indicated currently by pressing on a mouse button. 'vrSpeech finished-speaking' means the person has stopped speaking for this turn, indicated currently by releasing the mouse button. 'vrSpeech interp' indicates the text that the person said during this speech. 'vrSpeech asr-complete' means that speech recognition and segmentation are finished for this turn, and other modules are free to start using the results without expecting more utterances for this turn. 'vrSpeech partial' is an optional message, giving partial results during speech.

vrSpeech start $turn_id $speaker
vrSpeech finished-speaking $turn_id
vrSpeech partial $turn_id $utt_id $confidence $tone $speech
vrSpeech interp $turn_id $utt_id $confidence $tone $speech
vrSpeech asr-complete $turn_id

Parameters

'type', indicates the type of vrSpeech message, being 'start', 'finished-speaking', 'partial', 'interp', and 'asr-complete'.
'turn_id is a unique identifier for this whole turn. Must be the same across all messages in a sequence.
'speaker' is the speaker of this message, like 'user' in the Brad example.
'utt_id', the unique identifier for this utterances. A turn can be segmented into multiple utterances. Historically this is always '1', and there is only one 'vrSpeech interp' message per turn. The current convention is that the speech recognizer / client will send out a message with '$utt_id 0', and then a possible segmenter will send out one or more messages from 1 to n, labelled in order. A segmenter is currently not part of the toolkit. For 'vrSpeech partial' $utt_id is an incremental interpretation of the part of the utterances that has been processed so far. It will likely have a value of p followed by an integer but any unique identifier should be ok.
'confidence' - the confidence value for this interpretation. Should be between 0.0 and 1.0. Historically this is always 1.0.
'tone' - the prosody - should be "high" "normal" or "low" - historically, this is always "normal".
'speech' - the text of the (segment) interpretation. For 'utt_id 0' this will be the text of the full turn, for other segments this will be the text only for that segment.

Examples

This example indicates a speaker with the name 'user' starting to speak, saying 'hello how are you', and then stopping. Note that the speech is only analyzed / completed after the user has stopped speaking. Also note that all listed messages are needed to process a single utterance and notify the rest of the system of the results.

vrSpeech start test1 user
vrSpeech finished-speaking test1 user
vrSpeech interp test1 1 1.0 normal hello how are you
vrSpeech asr-complete test1

Components that send message

The speech client or speech recognizer.

Components that receive message

Modules that want to know about speech timing information or spoken text, usually a Natural Language Understanding Module or the NPCEditorIndividual messages are listed under the Messages page. Conceptual areas with groups of messages are described under the Sub Areas page.

Versions Compared

Old Version 14

New Version Current

Key

Overview

RemoteSpeechCmd

RemoteSpeechReply

vrExpress

Speech

Posture Change

Status / Request

Gaze

Emotion

vrSpeak

vrSpeech