Annotation guidelines

Overview

These guidelines address, subsequently: (1) some general issues in dialogue act annotation; (2) the segmentation of a dialogue into functional segments; (3) guidelines for the use of DIT++ concepts and the DiAML annotation language.

The examples that are used focus on certain specific issues annotation using the DIT++ taxonomy; examples of fully annotated dialogue fragments in DiAML can be found here.

1. General issues in DA annotation

Dialogue act annotation is about indicating the kind of intention that the speaker had; what was he trying to achieve? This is what the particpants in a dialogue are trying to establish when they interpret each other's communicative behaviour. The following three pieces of general advice for dialogue act annotators derives from this.

1. Do as an addressee would do.
When marking up a functional segment, put yourself in the position of the participant(s) at whom the utterance was addressed, and imagine that you try to understand what the speaker is trying to achieve. Why does he say what he says? What are the speaker's purposes in using this utterance? What assumptions does the speaker express about the addressee? Answering such questions should guide you in deciding which tags to assign, regardless of how exactly the speaker has expressed himself. Use all the information that you could have if you were the actual addressee, and like the addressee, try to interpret the speaker's communicative behaviour as well as you can.
2. Think functionally, not formally.
The linguistic form of an utterance often provides vital clues for choosing an annotation tag, but such clues may also be misleading; in choosing your annotation tags you should of course use the linguistic clues to your advantage, but don't let them fool you - the true question is not what the speaker says but what he means.

For example, Set Questions are questions where the speaker wants to know which elements of a certain domain have a certain property. In English, such questions often contain a word beginning with "wh", such as which as in Which books did you read on your holidays? or where in Where do your parents live?. But not all English sentences of this form express a Set Question: Why don't you go ahead is for instance typically a Suggestion rather than a question.

Similarly, Propositional Questions are questions where the speaker wants to know whether a certain statement is true or false. Such questions are typically expressed by interrogative sentences such as Is The Hague the capital of the Netherlands? or Do you like peanut butter? But not all sentences of this form express a Propositional Question; for example, Do you know what time it is? functions most often as in indirect way of requesting to tell the time, Would you like some coffee? is an Offer; and Shall we go? is a Suggestion.
3. Be specific.
Among the communicative functions that you can choose from, there are differences in specificity, corresponding with their relative positions in hierarchical subsystems. For instance, a Check Question is more specific than a Propositional Question, in that it additionally carries the expectation that the answer will be positive. Similarly, a Confirm is more specific than an Answer, in that it carries the additional assumption that the addressee expects the answer to be positive.

In general, try all the time to be as specific as you can. But if you're in serious doubt about whether to choose a more or a less specific function, and you don't really have evidence for chosing the more specific one, then use the less specific function tag that subsumes the more specific one.

1.1 Preliminaries

A dialogue has been defined as "a spoken, typed or written interaction in natural language between two or more agents" (DAMSL Revised Manual, p. 1). The term `agent' in this characterization is intended to cover both human and artificial participants. DIT++ is intended to apply to dialogues in a wider sense, where the participants not only use natural language but also nonverbal means, such as gestures and facial expressions (by human participants or embodied conversational systems), and means like highlighting, blinking, or a filling hour glass in the case of non-embodied computer systems.

The prototypical setting of human dialogue is that of face-to-face communication, where speech is combined with other vocal sounds (laughs, sighs, heavy breathing, etc.), facial expressions, gaze direction, and other physical activities including head-, hand-, arm-, and shoulder gestures, forms of touching (stroking, caressing, hugging, shaking hands, patting on the shoulder, etc.), and body posture changes. All these verbal and nonverbal activities may have a communicative meaning which can be made explicit in terms of dialogue acts. DIT++ has been successfully applied both to spoken and typed dialogue, and to a range of nonverbal and multimodal behaviours. (See e.g. Petukhova and Bunt, 2009e on the analysis of nodding as feedback signals.)

1.2 Explicit and implicit, implied and indirect functions

A functional segment has a communicative function for one of two reasons: 1) by virtue of having linguistic or nonverbal features which, in the context in which the segment occurs, are indicators of that function: or 2) by implication of having another function. In the first case it is common to say that the segment has that communicative function explicitly; in the second case that it has that function implicitly. The following example illustrates this:

1. A: Would you like to have some coffee?
2. B: Some coffee would be great, thanks.

A's utterance is an Offer; B's response is an Accept Offer by virtue of its linguistic form and the fact that it occurs immediately after an Offer. Since an offer can only be accepted when it has been understood, B's response by implication also has an implicit positive auto-feedback function.

The following types of implicit communicative functions can be distinguished:

A communicative function F2 is logically entailed by another function F1 because F1 is a special case of F2. This happens in hierarchies of communicative functions like the general-purpose functions of DIT++, where for instance a Confirm is a special case of an Answer, and a Correction is a special case of a Disagreement, which in turn is a special case of an Inform.
Another example is that an Accept Offer entails positive feedback on the corresponding question, and similarly for other responsive functions like Decline Offer, Answer, Confirm, Agreement, Accept Apology.
A communicative function F1 may have another function F2 as a pragmatic conversational implicature, i.e. in most situations where a functional segment has the function F1 it also has the function F2, assuming that the dialogue participants behave cooperatively. For example, a thanking act like Thank you will normally be understood as also a signal of positive feedback.

Should implicit communicative functions be annotated? Annotating logically entailed functions would be redundant, since by their very nature such functions can be inferred from explicit functions. There is no point in doing that. For conversationally implicated functions the situation is different, since these functions cannot be inferred from explicit functions without taking the context into account. It is therefore in general commendable to annotate them. An annotator running into the situation where a functional segment has an explicitly expressed communicative function and an implied function, should decide whether the implied function is a logical consequence or a matter of what is plausible in the given context. In the first case the implied function should not be annotated; in the second case it should. For more details about types of implicit functions and strategies for how to deal with them see Bunt (2011).

Standard speech act theory regards indirect speech acts, such as indirect requests, as just an indirect form of the same illocutionary acts. By contrast, DIT views indirect forms as signalling subtly different packages of beliefs and intentions than direct ones. For example, the direct request Tell me what time it is please carries the assumption that the addressee knows what time it is, whereas an indirect request like Do you know what time it is? or Can you tell me what time it is? does not carry that assumption (it does at least not express that assumption; in fact it questions it), and is best interpreted as Please tell me what time it is, if you know.

This example shows that an indirectly formulated request may have a conditional character: the speaker is expressing a request under the condition that the addressee is able to perform the requested action. In this case the annotator may therefore make use of the option to annotate the utterance as having a qualified Request function, with the attribute `conditionality' having the value `conditional'. This is represented in DiAML as follows:

< dialogueAct xml:id="da1" target="fs1" 
  sender="s" addressee="a" dimension="task"
  communicativeFunction="request" conditionality="conditional"/>
< /dialogueAct>

2. Segmentation

In DIT, dialogue acts correspond to functional segments, defined as a minimal stretch of communicative behaviour that has a communicative function; the requirement of being `minimal' has been added in order to ensure that communicative functions are assigned as accurately as possible to those stretches of behaviour which express these functions. Consider the following example (from a Map Task dialogue):

E: ... and then go direction that moon lander, that thing on those legs

This stretch of behaviour could be marked up as expressing an Instruct act as well as an Inform act which explains the term "moon lander". In order to do that accurately it is best to segment this stretch into two functional segments: fs1 = "and then go direction that moon lander" and fs2 ="that thing on those legs", and to assign the Instruct function to segment fs1 only, and the Inform function to fs2 only, rather than assigning both of them to the entire utterance. Such fine-grained segmentation also allows us to indicate the fact that the Inform in fs2 is an explanation of the Instruct in fs1, as in the following example, where a `rhetoricalLink' connects the dialogue act da2 with the dialogue act da1 through an `explanation' relation:

< dialogueAct xml:id="da1" target="fs1" 
  speaker="s" addressee="a" dimension="task"
  communicativeFunction="instruct"/> 
< dialogueAct xml:id="da2" target="fs2" 
  speaker="s" addressee="a"
  communicativeFunction="inform" dimension="alloFeedback"/> 
< rhetoricalLink dact="#da2" rhetoAntecedent="#da1" rhetoRel="explanation"/>

There are cases where the identification of the minimal stretch of behaviour that corresponds to a functional segment is not obvious, in particular when a longer stretch could be said to express a particular function, but where it consists of smaller parts which could also be said to express that same function. The following example illustrates this (from a Map Task dialogue):

1. E: and then you go up and around that, a little to the right
2. A: slightly northeast?
3. E: yeah, slightly northeast.

E's utterance 3 as a whole could be said to constitute a Confirm act, but each of the two parts `yeah' and `slightly northeast' could also be said to constitute two separate Confirms. Larsson (1998) has recommended in such cases to take a maximal approach and choose the larger stretch as the unit of annotation. Alternatively, the use of functional segments naturally suggests to always take a minimal approach. Which of these strategies is to be preferred may be determined by the purpose of the annotation, but clearly the minimal approach is more fine-grained.

A functional segment is most often a part of what is contributed by the participant who occupies the speaker role, distinguished by the fact that this part has a separate communicative function. However, when working from a pre-segmented transcription of a spoken dialogue, the segmentation used in the transcript is not necessarily perfect, or not quite as one would like it to be.

First, there may be cases where one would prefer a given segment to be segmented into smaller segments. In such a case it is best to assign the various tags that one would prefer to assign to the parts of the segment, to the segment as a whole. This could lead to assigning an inconsistent set of tags to a segment; in that case one either has to omit one or more tags, or temporarily accept the assignment of an inconsistent set of tags, and/or add a comment to the annotation to signal this problem. What is the best strategy in such cases depends on the purposes of the annotation and on the options offered by the annotation tool that is used.

Second, it may happen that a turn has been pre-segmented into certain parts where one would prefer to annotate a longer segment, formed by these parts. In such a case it is recommended to annotate all these parts with the same tags.

Third, a given segment may be `self-interrupted' by a part that has a different communicative function, as in the following example:

Can you tell me what time the train to ehm,... Viareggio leaves?

Here we see a Set Question interrupted by a Stalling segment (ehm). The preferred segmentation would distinguish in this case one functional segment in the Task dimension, viz. fs1 = Can you tell me what time the train to Viareggio leaves? and one in the Time Management dimension, viz. fs2 = ehm,..., leading to the following representation in DiAML:

< dialogueAct xml:id="da1" target="fs1" 
  speaker="s" addressee="a" dimension="task"
  communicativeFunction="request"  conditionality="conditional"/>
< dialogueAct xml:id="da2" target="fs2" 
  speaker="s" addressee="a"
  communicativeFunction="stalling" dimension="timeManagement"/>

If the segmentation has not distinguished the intervening segment as a separate functional segment, then again, it is best to assign the tags for the intervening segments to the entire segment as a whole.

Fourth, it may happen that a dialogue act corresponds to more than one turn, as in the following example, where the utterances in turns 1 and 3 together form an Answer:

1. A: There are two flights early in the morning, at 7.45 and at 8.15
2. B: Yes
3. A: and two more in the evening, at 7.15 and at 8.30

If the pre-segmentation does not distinguish the segment formed by utterances 1 and 3 as a single functional segment, but treats them as two separate segments, then it is best to give each of these parts the same tag and code them all as having a functional dependency relation with the same question. In this way it is clear that they are all partial answers to the same question.

3. Representing annotations in DiAML

The Dialogue Act Markup Language (DiAML) has been defined to have an abstract syntax, which specifies the conceptual structures of DiAML annotations in set-theopretical terms (see Bunt, 2010); a formal semantics which interprets these structures in terms of information-state update operations (see Bunt, 2011a), and an XML-based concerte syntax for representing DiAML aannotations. Here we only consider the concrete syntax representations of DiAML and how to use them.

A DiAML annotation structure consists of a functional segment and a set of annotations, which contain information about sender, addressee(s), communicative function, function qualifiers, dimension, functional and feedback dependence relations, and rhetorical relations. In order to be ISO-compliant, the representation of these structures assumes a three-level architecture, consisting of:

A. a primary source, which may correspond to a speech recording, textual transcription or any further low-level annotation thereof;
B. the marking of functional segments from the primary source;
C. the dialogue act annotation associated with a functional segment.

DiAML annotations are situated at level C. They refer to functional segments, which can be identified at level B by means of the 'functionalSegment' element, regardless whether is verbal, nonverbal, or multimodal; at level C the 'target' attribute is used to point to a functional segment.

A dialogue act is characterized in DIT as having (1) a sender; (2) at least one addressee; (3) a communicative function, which may have (4) one or more qualifiers; (5) a dimension (or category of semantic content); and possibly (6) functional and feedback dependence relations as well as rhetorical relations. This structure is reflected in the DiAML annotation of a dialogue act in the fact that a 'dialogueAct' element has obligatory attributes 'sender', 'addressee','`communicative function', and 'dimension', and optionally attributes to represent qualifiers. The optional functional relations, feedback relations, and rhetorical relations correspond to relational XML elements which may be added.

For a given functional segment in a dialogue, the sender and addressee roles are usually easy to assign. For assigning communicative functions, see below, sections 3.1 and 3.2. For assigning dimensions, the decision to be made is which kind of information or actions the diaogue act is about. Is it (1) concerning the underlying task/activity; or (2) concerning the speaker's processing of previous utterances; or (3) concerning the addressee's processing of previous utterances; or (4) concerning the allocation of the speaker role; or (5) concerning the time needed to continue the dialogue; or (6) concerning the editing of what the speaker is saying; or (7) concerning the editing of what the addressee is currently saying; or (8) concerning the structure of the dialogue; or (9) concerning social obligations?

3.1 Encoding general-purpose functions

Information transfer functions

All dialogue acts with an information transfer function have the main purpose of making certain information available to the addressee (acts with an Inform function or a function dominated by Inform in the hierarchy of general-purpose functions) or of the speaker obtaining certain information (the Information-seeking functions in the general-purpose functions hierarchy). The information to be obtained or made available can be of any kind, relating to the underlying task or activity, or relating to the interaction.

In order to decide whether a segment of dialogue has an information transfer function, an annotator should thus decide whether the segment has such a purpose. If so, the annotator can use the subtrees of the Information-providing and Information-seeking as decision trees, going systematically left-right through the functions at the next level down and checking the defining conditions that distinguish each of these functions from their ancestor and from each other. Since the functions at one level in a subtree are mutually exclusive, at most one of them applies. If one is found that applies, then go down one level to the functions dominated by this function, and repeat the process. Keep doing this until hitting a level where none of the functions apply. In that case choose the function that dominates the functions at that level.

Action discussion functions
All action discussion functions have in common that their semantic content describes an action, possibly with specifications of manner or frequence of performance. The actions under discussion can be of any kind: actions for moving the underlying task forward, or actions for managing the interaction, or actions for dealing with social obligations. This class of communicative functions falls apart into the classes of Commissives and Directives, familiar from speech act theory. Commissive acts all have as their common property that the sender expresses a commitment to performing an action, while directive acts are characterised by the sender having the goal that the addressee commits himself to performing an action. In order to decide whether a segment of dialogue has a commissive or a directive function, an annotator should decide whether the segment has the purpose of expressing or trying to impose such a commitment. If so, the annotator can use the subtrees of Commissives and Directives in the hierarchy of general-purpose functions as decision trees, in the same way as for choosing an information transfer function.

3.2 Encoding dimension-specific functions

In contrast with general-purpose communicative functions, dimension-specific functions can often be recognised by their use of fixed forms; all the dimensions have particular fixed forms and formulaic expressions.

3.2.1 Auto- and Allo-Feedback

Feedback acts have the purpose of providing or eliciting information about the processing of utterances in dialogue. Both auto- and allo-feedback providing functions are divided into positive and negative ones. Positive ones signal that the processing was (believed to be) successful; negative ones signal that there is a processing problem. Positive feedback is very often expressed implicitly, and should in such a case most probably not be encoded. Negative feedback is virtually always explicit, and as such easy to recognise. Some of the frequently used fixed forms for negative auto-feedback are Huh?, What? and equivalent expressions in other languages, and nonverbal signals such as raising eyebrows, frowning, or cupping a hand behind an ear.

Repetitions and rephrases are common forms of auto-feedback. A distinction can be made between the case where the speaker literally repeats (part of) what was said before (`echos') and the case where he rephrases (part of) what was said. For example:

1. A: I would like to travel next Saturday, in the afternoon.
2. B: Next Saturday in the afternoon I have a flight leaving at 16:10.
3. B: On Saturday May 8 after 12 p.m. I have a flight leaving at 16:10.

In utterance 2, B literally repeats part of A's question, thereby displaying what he perceived that A said. In utterance 3, by contrast, B paraphrases parts of A's question, and this can be taken to indicate not only what B heard but also how B interpreted what A said (which in this example may be particularly relevant for the interpretation of `next.)

On the other hand, positive feedback is often expressed in a rather inarticulate fashion by fixed forms like OK or Yes, Sure, etc. which may be taken to express overall successful processing of what was said, and correspond to the communicative function AutoPositive.

It may be worth noting that there is a systematic relation between auto- and allo-feedback acts. This is for the following reason. A dialogue act in the Allo-Feedback dimension is concerned with the addressee's processing of a previous utterance, e.g. A: What do you think I said?; when the addressee responds to that, e.g. B: I thought you said Tuesday then the speaker of this response is speaking about his own processing of a previous utterance, hence the response is an act in that participant's Allo-Feedback dimension. This is more generally the case: the response to an Allo-Feedback act is usually an Auto-Feedback act.

The reverse is also true. When a participant encounters a processing problem and tries to resolve it, as in Do you mean this Saturday?, and the addressee responds That's right, then the speaker of the response is talking about the other's processing, hence this is an act in the Allo-Feedback dimension.

3.2.2 Turn Management

Turn management functions are characterised by the sender having the goal to obtain, to keep, or to give away the speaker role. For an annotator, the issue to decide is thus whether the sender's behaviour expresses such a goal. Consider, for example, the case of a question-answer pair:

1. A: Do you know what time it is?
2. B: It's nearly twelve fifteen.

Does B, in answering A's question, express the goal to occupy the speaker role? This is not obvious, but it should be noted that B's primary aim is to answer A's question, and that in order to do so he cannot avoid taking the speaker role; this suggest that B did not have a separate goal to have the speaker role.

Similarly, does A, by asking a question, express that he wants B to occupy the speaker role next? The answer to this question is clearly No, since A can continue for a while occupying the speaker role after asking the question, as in the following example:

1. A: Do you know what time it is? I need to catch the twelve seventeen train.
2. B: It's twelve fifteen now.

Note that in this example participant A continued in the speaker role after asking a question, simply by continuing to speak. This raises a rather troubling question: does continuing to speak indicate the speaker's goal to keep the turn? In that case, one should assign a turn-keeping function to nearly everything that a speaker says. A recommendation for how to go about assigning turn-management functions is to only assign such a function to those stretches of communicative behaviour which have the sole (or the main) purpose to obtain, to keep, or to get rid of the speaker role. Just starting to speak, continuing to speak, or ceasing to speak should not be annotated as expressions of Turn management functions.

A particularity of the Turn Management dimension is that the dimension-specific functions are divided into two subclasses, that could in fact be considered as separate dimensions. Usually only the first segment in a turn has a turn-initial function and only the last one a turn-final one. Non-final utterances within a turn do not have a turn-final function, except when the speaker signals (for example by using a rising intonation or a filled pause) that the utterance is not intended to be the last one in the turn; that he wants to continue. In that case the utterance has a Turn Keeping function.

When a speaker accepts a turn that the addressee has assigned to him through a Turn Assign act, the relevant segment should be annotated as having the turn-initial function Turn Accept only when the speaker performs a separate act for the purpose of accepting the turn (such as nodding, or clearing his throat, or saying something like Yes or OK). The verbal and nonverbal activities that a speaker performs to seize the turn should be marked as Turn Grabbing; the segment that follows immediately after he has seized the turn should not be marked as having a turn-initial Turn Management function.

3.2.3 Time Management
Time management functions are concerned with the sender buying some time. DIT++ distinguishes two cases:

the speaker is unable to say immediately what he intended to say (Stalling);
the speaker suspends the dialogue for a while (Pausing).

In both cases there may be several reasons why the sender wants to buy some time. Stalling may occur, for instance, because the speaker is looking for the right words to express what he wants to convey, or because he hasn quite made up his mind as to which information to provide, or bacause he needs a little time to look up or compute certain information. Pausing may occur, for example, because the speaker is aware that collecting/computing the relevant information requires more time than is commonly available in a fluent conversation, or because something more urgent came up that he has to deal with first.

Stalling acts take the form of filled pauses (ehm, let me see, well,..), often occurring together with slowing down and short silences. Pausing acts explicitly claim or request some time: Just a minute, Wait a second, I'll be right back, etc. Fully explicit requests like Please wait while I check the current status should not be marked as Pausing acts, but as requests in the Time Management dimension, using the general-purpose function Request.

3.2.4 Own and Partner Communication Management

In Own Communication Management (OCM) acts the speaker is editing his own speech. The speaker interrupts himself, being aware that he said something wrong, and retracts something that he just said (Oh sorry no,...; or No wait,..), or replaces something he just said by something else (I want to travel on Tuesday THURsday).

Partner Communication Management (PCM) acts similarly edit the addressee's speech, who at that moment occupies the speaker role. Two important cases are the correction of what the addressee is saying (Correct Misspeaking) when a slip of the tongue occurs, and the completion of what the addressee/current speaker is struggling to say (Completion). In both cases the sender of the PCM act barges in and grabs the turn, or takes the turn which has become available because the addressee is hesitating.

3.2.5 Discourse Structuring

Discourse Structuring acts are concerned with the explicit structuring of the dialogue. Such acts occur frequently at the beginning and near the end of a dialogue. A dialogue needs to be opened in some way, and there are conventional ways of doing so. In multi-party dialogue an expression that is frequently used to open the dialogue is Okay! The same utterance is often used (though with a different intonation) to indicate that a dialogue can be closed, signaling positive feedback concerning the entire preceding dialogue. There do not seem to exist dialogue acts that have no other function than closing a dialogue; conventionally, a dialogue is considered closed when the participants have exchanged farewell greetings.

During a dialogue, the topic is often changed implicitly, simply by talking about a new topic. This happens especially if the new topic is closely related to the previous one, for instance by being a subtopic of the previous topic, or by being another subtopic of a more general topic. Implicit topic management should not be encoded; the fact that a new topic is addressed is a property of the semantic content of the Inform, the Question, or whatever dialogue is performed which addresses this new topic. Only explicitly signaled topic shifting (actual or intended) should be annotated as such.

3.2.6 Social Obligations Management (SOM)

The kind of social obligations that should be annotated depends on the kind of dialogue. Welcome and farewell greetings, that play a role in starting and ending a dialogue, are domain-independent, however, as are apologies and their acceptances, acts for introducing oneself, and thanking acts and their acceptances. All of these types of acts have conventional forms (`formulas') in every language. They tend to come in pairs: an initial greeting puts pressure on the addressee to send a response greeting; introducing oneself puts pressure on the addressee to also introduce himself; an apology puts pressure on the addressee to accept the apology; a thanking puts pressure on the addressee to downplay what he is thanked for (like in It was nothing; It was my pleasure); and a farewell greeting puts pressure on the addressee to produce a response farewell greeting.

SOM acts can also be performed by means of general-purpose functions. For instance, I'm extremely grateful for your help and I hope to see you next year in Hong Kong are Informs in the SOM dimension.

It is worth noting that utterances which serve a "social" purpose such as greetings, thanks, and apologies are often used to serve other purposes as well. Greetings like Hello!, for example, can be used also for opening a dialogue (a Dialogue Structuring function). Also, an expression of thanks can be used to signal that the speaker intends to terminate the dialogue, and can also be used for positive feedback.

3.3 Encoding communicative function qualifiers

Function qualifiers are available in DiAML for encoding various ways in which a speaker can specify certain conditions, qualifications, or feelings accompanying a dialogue act. For the encoding of (un-)certainty and conditionality, DiAML has binary-valued attributes one of which is the default value. For the encoding of feelings the 'sentiment' attribute is available which has an open class of values and no default value; if no value of the attribute is specified in an annotation this means that no such information is present.

3.3.1 Certainty

The sender of a dialogue act can express certainty or uncertainty about the correctness of the information provided in an information-providing act. This is illustrated in the following example, where the expressions ``I have a hunch that","probably","might", and ``I'm not sure if" are indicators of the speaker's uncertainty.

1. A: Do you know who'll be coming tonight?
2. B: I have a hunch that Mary won't come.
3. B: Peter, Alice, and Bert will probably come.
4. B: I heard that Tom and Anne might come.
5. B: I'm not sure if Bill will come.

When no expressions of uncertainty are presented, as in the following example, the resulting sentences no longer contain any suggestion that the speaker is uncertain about the correctness of what he says. This illustrates that the default value, corresponding to the unmarked case, is 'certain'.

1. A: Do you know who'll be coming tonight?
2. B: Mary won't come.
3. B: Peter, Alice, and Bert will come.
4. B: I heard that Tom and Anne [will] come.
5. B: Bill will come.

Speakers may also signal being very certain, as illustrated in the following examples. For such cases, the DiAML encoding with certainty=`certain' is recommended,

1. Mary will definitely not come.
2. Peter, Alice, and Bert will come for sure.
3. I certainly agree with that.

For the coding of a sender's certainty associated with the performance of an information-providing act, look for expressions of uncertainty and expressions of great certainty. (Un-)certainty and the lack thereof are not only indicated by verbal expressions, but also by prosody gaze direction, and several types of gestures. Prominent nonverbal expressions of uncertainty include gaze aversion, head waggles, lip pouting, lowering eyebrows, and self-touching.

Warning: verbal expressions of uncertainty, in particular adverbs, should sometimes be interpreted as part of the semantic content of a dialogue act, rather than as a qualification of the communicative function. The following examples illustrate this:

1. I'll come around eight o'clock, probably.
2. I'll come definitely before nine.

In these examples, probably and definitively apply to the time that is mentioned, not to the sender's certainty about his commitment to come.

Certainty and the lack thereof cannot only be indicated by verbal expressions, but also by prosody, direction of gaze, and several types of gestures. Prominent nonverbal expressions of uncertainty include gaze aversion, head waggles, lip pouting, lowerin geyebrows, and self-touching (like head scratching).

3.3.2 Conditionality

Conditionality refers to the possibility (with respect to ability and power), the necessity, or the willingness to perform an action; the qualifiers conditional and unconditional can therefore be attached to action-discussion functions. The following examples illustrate this phenomenon.

A: Would you like to have some coffee?
B: Thanks, only if you have it ready.
A: Can you to the presentation, if you're ready?
B: I can do that if you like.
A; I'll send you an email if you give me your address.
A: Can we just go over that again?
B: Just very quickly. I have to hurry you on here.
C: I don't think we have time for that, unless you make it very short.
A: I can make the buttons larger.
B: No, only if we want basic things to be visible.

In the first example we see the conditional acceptance of an offer; in the second example a conditional request, with a conditional acceptance; in the third a conditional promise; in in the fourth two conditional acceptances of a request; and in the fifth a conditional rejection of a suggestion. The absence of expressions indicating a condition leads to expressions that signal unconditional dialogue acts, hence the default value is 'unconditional', and does not need to be marked up. Explicit expressions of `unconditionality' are hard to find, other than the adverb unconditionally, which is hardly ever used in natural dialogue.

Conditional dialogue acts can often be recognised by the use of conditional expressions such as if ... or unless, and just bu can also be part of the semantic content rather than qualifiers of the communicative function.

3.3.3 Sentiment

A particular sentiment associated with the performance of a dialogue act may be annotated if the sender indicates an emotion or an attitude concerning the semantic content or the addressee, verbally or nonverbally, or both. Nonverbal expressions of sentiment exist in abundance and in great variety, including for instance smiling (happiness), eyebrow raising (surprise), pressing lips together (angst), and sighing (sadness). Specific guidelines for sentiment annotation cannot be given here, in view of the class of sentiment qualifiers not being specified in DIT++.

3.4 Encoding functional dependences, feedback dependences, and rhetorical relations

3.4.1 Functional dependence

A dialogue act A1 is functionally dependent on a previous dialogue act A2 (its `functional antecedent'), if its communicative function by its very nature responds to another dialogue act. This is the case for the following core communicative functions:

- Answer, Confirm, Disconfirm;
- Agreement, Disagreement, Correction;
- Address Request, Accept Request, Decline Request;
- Address Suggestion, Accept Suggestion, Decline Suggestion;
- Address Offer, Accept Offer, Decline Offer;
- Turn Accept;
- Return Greeting, Return Self-introduction, Accept Apology, Accept Thanking, Return Goodbye

Encoding a functional dependence relation means identifying the functional antecedent and linking the two dialogue acts by means of a `functionalDependence' element.

The identification of a functional antecedent is not straightforward if (a) the current dialogue act does not respond to a single dialogue act but to a combination of dialogue acts, as in the following example, or (b) responds to an implicit dialogue act.

1. U: Can you tell me what time there are trains from Harwich to York?
2. S: What day would you like to travel?
3. U: Tomorrow morning.
4. U: On Tuesday morning there are trains at 6:45, 70:30,...(etc.)

Utterance 4 forms a functional segment with function Answer, which responds to the question formed by the dialogue acts expressed by utterances 1 and 3 together. In such a case it is recommended to mark functional dependence relations to both these dialogue acts.

3.4.2 Feedback dependence

Every auto- or allo-feedback act is about the processing of one or more previous dialogue segments, and therefore has a feedback relation to these segments. This is the case both for feedback acts that have a dimension-specific communicative function and for feedback acts with a general-purpose function.

Encoding a feedback dependence relation means identifying the functional segment(s) that the feedback is about, and linking the dialogue act to these segment(s) by means of a `feedbackDependence' element. For feedback acts with an Auto-Positive or Allo-Positive function the feedback is usually about the previous utterance from another participant, but sometimes the feedback is more global, and can refer to everything that happened so far in the dialogue - in such a case it is best not to annotate a feedback dependence.

3.4.3 Rhetorical relations

Many of the relations which may occur between units in discourse such as Justification, Explanation, Cause-Effect, or Summary, and which in the linguistic literature are often called `rhetorical relations' or `discourse relations', may also occur between dialogue acts. DIT++ does not specify any particular set of such relations, and therefore does not provide detailed guidelines for their encoding. So-called `discourse markers' like also, but, because, for example, in short, so often signal such relations as Elaboration, Motivation, Justification, Cause, Exemplification, Conclusion or Summary, and they are often multifunctional. For example, a protracted turn-initial A..n..d,... may be a functional segment with the functions `Turn Take', `Stalling', `Auto-Positive', and may also be the first part of a longer functional segment expressing a dialogue act which has an Elaboration relation to a previous dialogue act. (See Petukhova & Bunt, 2009a on the multifunctionality of discourse markers; Mann & Thompson, 1988, on rhetorical relations; Hovy & Maier, 1995 more generally on discourse relations, and Prevot et al., 2011 on discourse relations in dialogue.)

Back to DIT home page.

<harry.bunt@uvt.nl>

Last modified: Mon Nov 19 12:43:00 CET 2012