Table of Contents | ||
---|---|---|
|
Overview
Communication between most modules happens by message passing, which is implemented in ActiveMQ. Messages are broadcasted through the system and a component can subscribe to certain messages. Developers can use the VH Messaging library to (VHMsg) to quickly implement receiving and sending ActiveMQ messages . Note that ActiveMQ was preceded by Elvin and that due to legacy reasons one might still encounter mentionings of Elvin, even though no Elvin code is used anymore.in a variety of languages, including C++, C# and Java.
Every module has its own messaging API. In addition, each module should at least implement the basic messaging protocol that allows modules to request and share whether they are online:
- vrAllCall [reason], pings all modules.
- vrComponent component-id sub, response by module to 'vrAllCall'; should also be send sent on start-up by module.
- vrKillComponent {component-id,'|all'}, requests a single module or all modules to shut down.
- vrProcEnd component-id sub, send sent by module on exit.
Below follows a detailed description of each message. Note that this section is currently incomplete.
RemoteSpeechCmd
RemoteSpeechCmd
Description
Sent by SmartBody to Text-To-Speech relay. Requests for a certain text and voice to create speech.
RemoteSpeechReply
RemoteSpeechReply
Description
Reply from Text-To-Speech relay (to SmartBody) with viseme and word timing information.
vrExpress
- vrExpress char-id addressee-id utterance-id xml-messsage
This message is sent to NVBG by the NLU, NPCEditor or similar module. This message can be used to convey information about speech data, posture, status change, emotion change, gaze data etc. as shown below.
Speech
The speech messages are characterized by the speech tag within them. They are interpreted and the corresponding output bml is generated with the speech time marks, animations, head-nods, facial-movements etc. These animations are generated based on the content of the speech tag and the fml tag in the input message.
...
Posture Change
These messages are characterized by the <body posture=""> tag which allows NVBG to know that there has been a change in posture.
vrExpress "harmony" "None" "??" "<?xml version="1.0" encoding="UTF-8" standalone="no" ?><act>
<participant id="harmony" role="actor" />
<bml>
<body posture="HandsAtSide" />
</bml>
</act>"
Status / Request
The idle_behavior and all_behavior attributes within the request tag allows NVBG to keep track of whether or not to generate the corresponding behavior.
vrExpress "harmony" "None" "??" "<?xml version="1.0" encoding="UTF-8" standalone="no" ?><act>
<participant id="harmony" role="actor" />
<fml>
<status type="present" />
<request type="idle_behavior" value="off" />
</fml>
</act>"
Gaze
These gaze tags, if present within the input message are transferred unaltered to the output message.
vrExpress "harmony" "ranger" "constant103" "<?xml version="1.0" encoding="UTF-8" standalone="no" ?><act>
<participant id="harmony" role="actor" />
<fml>
<gaze type="weak-focus" target="ranger" track="1" speed="normal" > "listen_to_speaker" </gaze>
</fml>
</act>"
Emotion
The affect tag contains data about the emotional state the character is currently in. This can be used to affect output behavior.
vrExpress "harmony" "None" "schererharmony17" "<?xml version="1.0" encoding="UTF-8" standalone="no" ?><act>
<participant id="harmony" role="actor" />
<fml>
<affect type="Fear" STANCE="LEAKED" intensity="110.475"></affect>
</fml>
<bml> </bml>
</act>"
Description
Sent from the NPCEditor (or any other module requesting behavior) to the Nonverbal Behavior Generator. See the logs when running Brad for examples. This message will be better documented in future iterations.
vrSpeak
- vrSpeak char-id addressee-id utterance-id xml-messsage
Below is an example of the vrSpeak message that is output from NVBG. It contains animation/head-nods/facial behavior based on the text that is to be spoken/heard.
vrSpeak brad user 1337363228078-9-1 <?xml version="1.0" encoding="utf-16"?><act>
<participant id="brad" role="actor" />
<bml>
<speech id="sp1" ref="voice_defaultTTS" type="application/ssml+xml">
<mark name="T0" />Depending<mark name="T1" /><mark name="T2" />on
<mark name="T3" /><mark name="T4" />the
<mark name="T5" /><mark name="T6" />default
<mark name="T7" /><mark name="T8" />voice
<mark name="T9" /><mark name="T10" />you
<mark name="T11" /><mark name="T12" />have
<mark name="T13" /><mark name="T14" />selected
<mark name="T15" /><mark name="T16" />in
<mark name="T17" /><mark name="T18" />Windows,
<mark name="T19" /><mark name="T20" />I
<mark name="T21" /><mark name="T22" />can
<mark name="T23" /><mark name="T24" />sound
<mark name="T25" /><mark name="T26" />pretty
<mark name="T27" /><mark name="T28" />bad.
<mark name="T29" />
</speech>
<event message="vrAgentSpeech partial 1337363228078-9-1 T1 Depending " stroke="sp1:T1" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T3 Depending on " stroke="sp1:T3" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T5 Depending on the " stroke="sp1:T5" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T7 Depending on the default " stroke="sp1:T7" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T9 Depending on the default voice " stroke="sp1:T9" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T11 Depending on the default voice you " stroke="sp1:T11" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T13 Depending on the default voice you have " stroke="sp1:T13" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T15 Depending on the default voice you have selected " stroke="sp1:T15" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T17 Depending on the default voice you have selected in " stroke="sp1:T17" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T19 Depending on the default voice you have selected in Windows, " stroke="sp1:T19" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T21 Depending on the default voice you have selected in Windows, I " stroke="sp1:T21" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T23 Depending on the default voice you have selected in Windows, I can " stroke="sp1:T23" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T25 Depending on the default voice you have selected in Windows, I can sound " stroke="sp1:T25" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T27 Depending on the default voice you have selected in Windows, I can sound pretty " stroke="sp1:T27" />
<event message="vrAgentSpeech partial 1337363228078-9-1 T29 Depending on the default voice you have selected in Windows, I can sound pretty bad. " stroke="sp1:T29" />
<gaze target="user" direction="POLAR 0" angle="0" sbm:joint-range="HEAD EYES" xmlns:sbm="http://ict.usc.edu" />
<sbm:event message="vrSpoke brad user 1337363228078-9-1 Depending on the default voice you have selected in Windows, I can sound pretty bad." stroke="sp1:relax" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns:sbm="http://ict.usc.edu" />
<!--first_VP Animation-->
<animation stroke="sp1:T3" priority="5" name="HandsAtSide_Arms_GestureWhy" />
<!--First noun clause nod-->
<head type="NOD" amount="0.10" repeats="1.0" relax="sp1:T5" priority="5" />
<!--Noun clause nod-->
<head type="NOD" amount="0.10" repeats="1.0" relax="sp1:T11" priority="5" />
<!--Noun clause nod-->
<head type="NOD" amount="0.10" repeats="1.0" relax="sp1:T19" priority="5" />
<!--Noun clause nod-->
<head type="NOD" amount="0.10" repeats="1.0" relax="sp1:T21" priority="5" />
</bml>
</act>
Explanation:
The Time-Markers i.e. T0, T1, T2 and so on, are used by the NVBG to mark each individual word in the sentence and later it can use this information to play animations at the start/end of that word.
e.g. in the above output bml, the animation "HandsAtSide_Arms_GestureWhy" is timed to play at sp1:T3, which means it will be played when the word "on" has been spoken. 'sp1' is the id of the speech tag which indicates that this 'T3' is from this particular speech tag. (currently NVBG only supports one speeech tag).
The appropriate "vrAgentSpeech partial" event messages are sent out by smartbody after each word has been spoken by the character. e.g."vrAgentSpeech partial 1337363228078-9-1 T19 Depending on the default voice you have selected in Windows," is sent out after the word Windows is spoken.
Similarly the <head> tag is interpreted by smartbody to generate Nods and other specified behavior.
The "priority" attribute allows NVBG to decide which behavior is to be culled if any of the behaviors overlap. This is done before the final message is sent out.
Description
Sent from the Nonverbal Behavior Generator to SmartBody. Requests for SmartBody to realize verbal and nonverbal behavior. See the logs when running Brad for examples. This message will be better documented in future iterations.
vrSpeech
- vrSpeech type turn-id other
Description
vrSpeech messages go together as a sequence with the same 'turn-id', representing the speech recognition, timing, and possible segmentation of a human utterance. 'vrSpeech start' means the person has started to speak - indicated currently by pressing on a mouse button. 'vrSpeech finished-speaking' means the person has stopped speaking for this turn, indicated currently by releasing the mouse button. 'vrSpeech interp' indicates the text that the person said during this speech. 'vrSpeech asr-complete' means that speech recognition and segmentation are finished for this turn, and other modules are free to start using the results without expecting more utterances for this turn. 'vrSpeech partial' is an optional message, giving partial results during speech.
- vrSpeech start $turn_id $speaker
- vrSpeech finished-speaking $turn_id
- vrSpeech partial $turn_id $utt_id $confidence $tone $speech
- vrSpeech interp $turn_id $utt_id $confidence $tone $speech
- vrSpeech asr-complete $turn_id
Parameters
- 'type', indicates the type of vrSpeech message, being 'start', 'finished-speaking', 'partial', 'interp', and 'asr-complete'.
- 'turn_id is a unique identifier for this whole turn. Must be the same across all messages in a sequence.
- 'speaker' is the speaker of this message, like 'user' in the Brad example.
- 'utt_id', the unique identifier for this utterances. A turn can be segmented into multiple utterances. Historically this is always '1', and there is only one 'vrSpeech interp' message per turn. The current convention is that the speech recognizer / client will send out a message with '$utt_id 0', and then a possible segmenter will send out one or more messages from 1 to n, labelled in order. A segmenter is currently not part of the toolkit. For 'vrSpeech partial' $utt_id is an incremental interpretation of the part of the utterances that has been processed so far. It will likely have a value of p followed by an integer but any unique identifier should be ok.
- 'confidence' - the confidence value for this interpretation. Should be between 0.0 and 1.0. Historically this is always 1.0.
- 'tone' - the prosody - should be "high" "normal" or "low" - historically, this is always "normal".
- 'speech' - the text of the (segment) interpretation. For 'utt_id 0' this will be the text of the full turn, for other segments this will be the text only for that segment.
Examples
This example indicates a speaker with the name 'user' starting to speak, saying 'hello how are you', and then stopping. Note that the speech is only analyzed / completed after the user has stopped speaking. Also note that all listed messages are needed to process a single utterance and notify the rest of the system of the results.
- vrSpeech start test1 user
- vrSpeech finished-speaking test1 user
- vrSpeech interp test1 1 1.0 normal hello how are you
- vrSpeech asr-complete test1
Components that send message
The speech client or speech recognizer.
Components that receive message
Modules that want to know about speech timing information or spoken text, usually a Natural Language Understanding Module or the NPCEditorIndividual messages are listed under the Messages page. Conceptual areas with groups of messages are described under the Sub Areas page.