Dialogue act annotation is about indicating the kind of intention that the speaker had; what was he trying to achieve? This is what the particpants in a dialogue are trying to establish when they interpret each other's communicative behaviour. The following three pieces of general advice for dialogue act annotators derives from this.
1. Do as an addressee would do.
When marking up a functional segment, put yourself in the position of the participant(s) at whom the utterance was addressed, and imagine that you try to understand what the
speaker is trying to achieve. Why does he say what he says? What are the speaker's purposes in using this utterance? What assumptions does the speaker express about the
addressee? Answering such questions should guide you in deciding which tags to assign, regardless of how exactly the speaker has expressed himself.
Use all the information that you could have if you were the actual addressee, and like the addressee, try to interpret the speaker's communicative behaviour as well as you can.
2. Think functionally, not formally.
The linguistic form of an utterance often provides vital clues for choosing an annotation tag, but such clues may also be misleading;
in choosing your annotation tags you should of course use the linguistic clues to your advantage, but don't let them fool you -
the true question is not what the speaker says but what he means.
For example, Set Questions are questions where the speaker wants to know which elements of a certain domain have a certain property.
In English, such questions often contain a word beginning with "wh", such as which as in Which books did you read on your holidays? or
where in Where do your parents live?.
But not all English sentences of this form express a Set Question:
Why don't you go ahead is for instance typically a Suggestion rather than a question.
Similarly, Propositional Questions are questions where the speaker wants to know whether a certain statement is true or false.
Such questions are typically expressed by interrogative sentences such as Is The Hague the capital of the Netherlands? or Do you like peanut butter?
But not all sentences of this form express a Propositional Question; for example,
Do you know what time it is? functions most often as in indirect way of requesting to tell the time, Would you like some coffee? is an Offer; and
Shall we go? is a Suggestion.
3. Be specific.
Among the communicative functions that you can choose from, there are differences in specificity, corresponding with their relative positions in hierarchical subsystems.
For instance, a Check Question is more specific than a Propositional Question, in that it additionally carries the expectation that the answer will be positive.
Similarly, a Confirm is more specific than an Answer, in that it carries the additional assumption that the addressee expects the answer to be positive.
In general, try all the time to be as specific as you can. But if you're in serious doubt about whether to choose a more or a less specific function, and you don't really have evidence for chosing the more specific one, then use the less specific function tag that subsumes the more specific one.
A functional segment has a communicative function for one of two reasons: 1) by virtue of having linguistic or nonverbal features which, in the context in which the segment occurs, are indicators of that function: or 2) by implication of having another function.
In the first case it is common to say that the segment has that communicative function
explicitly; in the second case that it has that function implicitly.
The following example illustrates this:
1. A: Would you like to have some coffee?
2. B: Some coffee would be great, thanks.
A's utterance is an Offer; B's response is an Accept Offer by virtue of its linguistic form and the fact that it occurs immediately after an Offer. Since an offer can only be accepted when it has been understood, B's response by implication also has an implicit positive auto-feedback function.
The following types of implicit communicative functions can be distinguished:
Should implicit communicative functions be annotated? Annotating logically entailed functions would be redundant, since by their very nature such functions can be inferred from explicit functions. There is no point in doing that. For conversationally implicated functions the situation is different, since these functions cannot be inferred from explicit functions without taking the context into account. It is therefore in general commendable to annotate them. An annotator running into the situation where a functional segment has an explicitly expressed communicative function and an implied function, should decide whether the implied function is a logical consequence or a matter of what is plausible in the given context. In the first case the implied function should not be annotated; in the second case it should. For more details about types of implicit functions and strategies for how to deal with them see Bunt (2011).
Standard speech act theory regards indirect speech acts, such as indirect requests, as just an indirect form of the same illocutionary acts. By contrast, DIT views indirect forms as signalling subtly different packages of beliefs and intentions than direct ones. For example, the direct request Tell me what time it is please carries the assumption that the addressee knows what time it is, whereas an indirect request like Do you know what time it is? or Can you tell me what time it is? does not carry that assumption (it does at least not express that assumption; in fact it questions it), and is best interpreted as Please tell me what time it is, if you know.
This example shows that an indirectly formulated request may have a conditional character: the speaker is expressing a request under the condition that the addressee is able to perform the requested action. In this case the annotator may therefore make use of the option to annotate the utterance as having a qualified Request function, with the attribute `conditionality' having the value `conditional'. This is represented in DiAML as follows:
< dialogueAct xml:id="da1" target="fs1" sender="s" addressee="a" dimension="task" communicativeFunction="request" conditionality="conditional"/> < /dialogueAct>
This stretch of behaviour could be marked up as expressing an Instruct act as well as an Inform act which explains the term "moon lander". In order to do that accurately it is best to segment this stretch into two functional segments: fs1 = "and then go direction that moon lander" and fs2 ="that thing on those legs", and to assign the Instruct function to segment fs1 only, and the Inform function to fs2 only, rather than assigning both of them to the entire utterance. Such fine-grained segmentation also allows us to indicate the fact that the Inform in fs2 is an explanation of the Instruct in fs1, as in the following example, where a `rhetoricalLink' connects the dialogue act da2 with the dialogue act da1 through an `explanation' relation:
< dialogueAct xml:id="da1" target="fs1" speaker="s" addressee="a" dimension="task" communicativeFunction="instruct"/> < dialogueAct xml:id="da2" target="fs2" speaker="s" addressee="a" communicativeFunction="inform" dimension="alloFeedback"/> < rhetoricalLink dact="#da2" rhetoAntecedent="#da1" rhetoRel="explanation"/>
There are cases where the identification of the minimal stretch of behaviour that corresponds to a functional segment is not obvious, in particular when a longer stretch could be said to express a particular function, but where it consists of smaller parts which could also be said to express that same function. The following example illustrates this (from a Map Task dialogue):
1. E: and then you go up and around that, a little to the rightE's utterance 3 as a whole could be said to constitute a Confirm act, but each of the two parts `yeah' and `slightly northeast' could also be said to constitute two separate Confirms. Larsson (1998) has recommended in such cases to take a maximal approach and choose the larger stretch as the unit of annotation. Alternatively, the use of functional segments naturally suggests to always take a minimal approach. Which of these strategies is to be preferred may be determined by the purpose of the annotation, but clearly the minimal approach is more fine-grained.
A functional segment is most often a part of what is contributed by the participant who occupies the speaker role, distinguished by the fact that this part has a separate communicative function. However, when working from a pre-segmented transcription of a spoken dialogue, the segmentation used in the transcript is not necessarily perfect, or not quite as one would like it to be.
First, there may be cases where one would prefer a given segment to be segmented into smaller segments. In such a case it is best to assign the various tags that one would prefer to assign to the parts of the segment, to the segment as a whole. This could lead to assigning an inconsistent set of tags to a segment; in that case one either has to omit one or more tags, or temporarily accept the assignment of an inconsistent set of tags, and/or add a comment to the annotation to signal this problem. What is the best strategy in such cases depends on the purposes of the annotation and on the options offered by the annotation tool that is used.
Second, it may happen that a turn has been pre-segmented into certain parts where one would prefer to annotate a longer segment, formed by these parts. In such a case it is recommended to annotate all these parts with the same tags.
Third, a given segment may be `self-interrupted' by a part that has a different communicative function, as in the following example:
Can you tell me what time the train to ehm,... Viareggio leaves?Here we see a Set Question interrupted by a Stalling segment (ehm). The preferred segmentation would distinguish in this case one functional segment in the Task dimension, viz. fs1 = Can you tell me what time the train to Viareggio leaves? and one in the Time Management dimension, viz. fs2 = ehm,..., leading to the following representation in DiAML:
< dialogueAct xml:id="da1" target="fs1" speaker="s" addressee="a" dimension="task" communicativeFunction="request" conditionality="conditional"/> < dialogueAct xml:id="da2" target="fs2" speaker="s" addressee="a" communicativeFunction="stalling" dimension="timeManagement"/>If the segmentation has not distinguished the intervening segment as a separate functional segment, then again, it is best to assign the tags for the intervening segments to the entire segment as a whole.
Fourth, it may happen that a dialogue act corresponds to more than one turn, as in the following example, where the utterances in turns 1 and 3 together form an Answer:
1. A: There are two flights early in the morning, at 7.45 and at 8.15
The Dialogue Act Markup Language (DiAML) has been defined to have an abstract syntax,
which specifies the conceptual structures of DiAML annotations in set-theopretical terms
(see Bunt, 2010); a formal semantics which interprets these structures in terms of information-state update operations (see Bunt, 2011a), and an XML-based concerte syntax for representing
DiAML aannotations. Here we only consider the concrete syntax representations of DiAML
and how to use them.
A DiAML annotation structure consists of a functional segment and a set of annotations,
which contain information about sender, addressee(s), communicative function,
function qualifiers, dimension, functional and feedback dependence relations,
and rhetorical relations.
In order to be ISO-compliant, the representation of these structures assumes a three-level
architecture, consisting of:
DiAML annotations are situated at level C. They refer to functional segments, which can be identified at level B by means of the 'functionalSegment' element, regardless whether is verbal, nonverbal, or multimodal; at level C the 'target' attribute is used to point to a functional segment.
A dialogue act is characterized in DIT as having (1) a sender; (2) at least one addressee; (3) a communicative function, which may have (4) one or more qualifiers; (5) a dimension (or category of semantic content); and possibly (6) functional and feedback dependence relations as well as rhetorical relations. This structure is reflected in the DiAML annotation of a dialogue act in the fact that a 'dialogueAct' element has obligatory attributes 'sender', 'addressee','`communicative function', and 'dimension', and optionally attributes to represent qualifiers. The optional functional relations, feedback relations, and rhetorical relations correspond to relational XML elements which may be added.
For a given functional segment in a dialogue, the sender and addressee roles are usually easy to assign. For assigning communicative functions, see below, sections 3.1 and 3.2. For assigning dimensions, the decision to be made is which kind of information or actions the diaogue act is about. Is it (1) concerning the underlying task/activity; or (2) concerning the speaker's processing of previous utterances; or (3) concerning the addressee's processing of previous utterances; or (4) concerning the allocation of the speaker role; or (5) concerning the time needed to continue the dialogue; or (6) concerning the editing of what the speaker is saying; or (7) concerning the editing of what the addressee is currently saying; or (8) concerning the structure of the dialogue; or (9) concerning social obligations?
All dialogue acts with an information transfer function have the main purpose of making certain information available to the addressee (acts with an Inform function or a function dominated by Inform in the hierarchy of general-purpose functions) or of the speaker obtaining certain information (the Information-seeking functions in the general-purpose functions hierarchy). The information to be obtained or made available can be of any kind, relating to the underlying task or activity, or relating to the interaction.
In order to decide whether a segment of dialogue has an information transfer function, an annotator should thus decide whether the segment has such a purpose. If so, the annotator can use the subtrees of the Information-providing and Information-seeking as decision trees, going systematically left-right through the functions at the next level down and checking the defining conditions that distinguish each of these functions from their ancestor and from each other. Since the functions at one level in a subtree are mutually exclusive, at most one of them applies. If one is found that applies, then go down one level to the functions dominated by this function, and repeat the process. Keep doing this until hitting a level where none of the functions apply. In that case choose the function that dominates the functions at that level.
Feedback acts have the purpose of providing or eliciting information about the processing of utterances in dialogue. Both auto- and allo-feedback providing functions are divided into positive and negative ones. Positive ones signal that the processing was (believed to be) successful; negative ones signal that there is a processing problem. Positive feedback is very often expressed implicitly, and should in such a case most probably not be encoded. Negative feedback is virtually always explicit, and as such easy to recognise. Some of the frequently used fixed forms for negative auto-feedback are Huh?, What? and equivalent expressions in other languages, and nonverbal signals such as raising eyebrows, frowning, or cupping a hand behind an ear.
Repetitions and rephrases are common forms of auto-feedback. A distinction can be made between the case where the speaker literally repeats (part of) what was said before (`echos') and the case where he rephrases (part of) what was said. For example:
1. A: I would like to travel next Saturday, in the afternoon.In utterance 2, B literally repeats part of A's question, thereby displaying what he perceived that A said. In utterance 3, by contrast, B paraphrases parts of A's question, and this can be taken to indicate not only what B heard but also how B interpreted what A said (which in this example may be particularly relevant for the interpretation of `next.)
On the other hand, positive feedback is often expressed in a rather inarticulate fashion by fixed forms like OK or Yes, Sure, etc. which may be taken to express overall successful processing of what was said, and correspond to the communicative function AutoPositive.
It may be worth noting that there is a systematic relation between auto- and allo-feedback acts. This is for the following reason. A dialogue act in the Allo-Feedback dimension is concerned with the addressee's processing of a previous utterance, e.g. A: What do you think I said?; when the addressee responds to that, e.g. B: I thought you said Tuesday then the speaker of this response is speaking about his own processing of a previous utterance, hence the response is an act in that participant's Allo-Feedback dimension. This is more generally the case: the response to an Allo-Feedback act is usually an Auto-Feedback act.
The reverse is also true. When a participant encounters a processing problem and tries to resolve it, as in Do you mean this Saturday?, and the addressee responds That's right, then the speaker of the response is talking about the other's processing, hence this is an act in the Allo-Feedback dimension.
Turn management functions are characterised by the sender having the goal to obtain, to keep, or to give away the speaker role. For an annotator, the issue to decide is thus whether the sender's behaviour expresses such a goal. Consider, for example, the case of a question-answer pair:
1. A: Do you know what time it is?Does B, in answering A's question, express the goal to occupy the speaker role? This is not obvious, but it should be noted that B's primary aim is to answer A's question, and that in order to do so he cannot avoid taking the speaker role; this suggest that B did not have a separate goal to have the speaker role.
Similarly, does A, by asking a question, express that he wants B to occupy the speaker role next? The answer to this question is clearly No, since A can continue for a while occupying the speaker role after asking the question, as in the following example:
1. A: Do you know what time it is? I need to catch the twelve seventeen train.Note that in this example participant A continued in the speaker role after asking a question, simply by continuing to speak. This raises a rather troubling question: does continuing to speak indicate the speaker's goal to keep the turn? In that case, one should assign a turn-keeping function to nearly everything that a speaker says. A recommendation for how to go about assigning turn-management functions is to only assign such a function to those stretches of communicative behaviour which have the sole (or the main) purpose to obtain, to keep, or to get rid of the speaker role. Just starting to speak, continuing to speak, or ceasing to speak should not be annotated as expressions of Turn management functions.
A particularity of the Turn Management dimension is that the dimension-specific functions are divided into two subclasses, that could in fact be considered as separate dimensions. Usually only the first segment in a turn has a turn-initial function and only the last one a turn-final one. Non-final utterances within a turn do not have a turn-final function, except when the speaker signals (for example by using a rising intonation or a filled pause) that the utterance is not intended to be the last one in the turn; that he wants to continue. In that case the utterance has a Turn Keeping function.
When a speaker accepts a turn that the addressee has assigned to him through a Turn Assign act, the relevant segment should be annotated as having the turn-initial function Turn Accept only when the speaker performs a separate act for the purpose of accepting the turn (such as nodding, or clearing his throat, or saying something like Yes or OK). The verbal and nonverbal activities that a speaker performs to seize the turn should be marked as Turn Grabbing; the segment that follows immediately after he has seized the turn should not be marked as having a turn-initial Turn Management function.
In both cases there may be several reasons why the sender wants to buy some time. Stalling may occur, for instance, because the speaker is looking for the right words to express what he wants to convey, or because he hasn quite made up his mind as to which information to provide, or bacause he needs a little time to look up or compute certain information. Pausing may occur, for example, because the speaker is aware that collecting/computing the relevant information requires more time than is commonly available in a fluent conversation, or because something more urgent came up that he has to deal with first.
Stalling acts take the form of filled pauses (ehm, let me see, well,..), often occurring together with slowing down and short silences. Pausing acts explicitly claim or request some time: Just a minute, Wait a second, I'll be right back, etc. Fully explicit requests like Please wait while I check the current status should not be marked as Pausing acts, but as requests in the Time Management dimension, using the general-purpose function Request.
In Own Communication Management (OCM) acts the speaker is editing his own speech. The speaker interrupts himself, being aware that he said something wrong, and retracts something that he just said (Oh sorry no,...; or No wait,..), or replaces something he just said by something else (I want to travel on Tuesday THURsday).
Partner Communication Management (PCM) acts similarly edit the addressee's speech, who at that moment occupies the speaker role. Two important cases are the correction of what the addressee is saying (Correct Misspeaking) when a slip of the tongue occurs, and the completion of what the addressee/current speaker is struggling to say (Completion). In both cases the sender of the PCM act barges in and grabs the turn, or takes the turn which has become available because the addressee is hesitating.
Discourse Structuring acts are concerned with the explicit structuring of the dialogue. Such acts occur frequently at the beginning and near the end of a dialogue. A dialogue needs to be opened in some way, and there are conventional ways of doing so. In multi-party dialogue an expression that is frequently used to open the dialogue is Okay! The same utterance is often used (though with a different intonation) to indicate that a dialogue can be closed, signaling positive feedback concerning the entire preceding dialogue. There do not seem to exist dialogue acts that have no other function than closing a dialogue; conventionally, a dialogue is considered closed when the participants have exchanged farewell greetings.
During a dialogue, the topic is often changed implicitly, simply by talking about a new topic. This happens especially if the new topic is closely related to the previous one, for instance by being a subtopic of the previous topic, or by being another subtopic of a more general topic. Implicit topic management should not be encoded; the fact that a new topic is addressed is a property of the semantic content of the Inform, the Question, or whatever dialogue is performed which addresses this new topic. Only explicitly signaled topic shifting (actual or intended) should be annotated as such.
The kind of social obligations that should be annotated depends on the kind of dialogue. Welcome and farewell greetings, that play a role in starting and ending a dialogue, are domain-independent, however, as are apologies and their acceptances, acts for introducing oneself, and thanking acts and their acceptances. All of these types of acts have conventional forms (`formulas') in every language. They tend to come in pairs: an initial greeting puts pressure on the addressee to send a response greeting; introducing oneself puts pressure on the addressee to also introduce himself; an apology puts pressure on the addressee to accept the apology; a thanking puts pressure on the addressee to downplay what he is thanked for (like in It was nothing; It was my pleasure); and a farewell greeting puts pressure on the addressee to produce a response farewell greeting.
SOM acts can also be performed by means of general-purpose functions. For instance, I'm extremely grateful for your help and I hope to see you next year in Hong Kong are Informs in the SOM dimension.
It is worth noting that utterances which serve a "social" purpose such as greetings, thanks, and apologies are often used to serve other purposes as well. Greetings like Hello!, for example, can be used also for opening a dialogue (a Dialogue Structuring function). Also, an expression of thanks can be used to signal that the speaker intends to terminate the dialogue, and can also be used for positive feedback.
Function qualifiers are available in DiAML for encoding various ways in which a speaker can specify certain conditions, qualifications, or feelings accompanying a dialogue act. For the encoding of (un-)certainty and conditionality, DiAML has binary-valued attributes one of which is the default value. For the encoding of feelings the 'sentiment' attribute is available which has an open class of values and no default value; if no value of the attribute is specified in an annotation this means that no such information is present.
3.3.1 CertaintyThe sender of a dialogue act can express certainty or uncertainty about the correctness of the information provided in an information-providing act. This is illustrated in the following example, where the expressions ``I have a hunch that","probably","might", and ``I'm not sure if" are indicators of the speaker's uncertainty.
1. A: Do you know who'll be coming tonight?When no expressions of uncertainty are presented, as in the following example, the resulting sentences no longer contain any suggestion that the speaker is uncertain about the correctness of what he says. This illustrates that the default value, corresponding to the unmarked case, is 'certain'.
1. A: Do you know who'll be coming tonight?Speakers may also signal being very certain, as illustrated in the following examples. For such cases, the DiAML encoding with certainty=`certain' is recommended,
1. Mary will definitely not come.For the coding of a sender's certainty associated with the performance of an information-providing act, look for expressions of uncertainty and expressions of great certainty. (Un-)certainty and the lack thereof are not only indicated by verbal expressions, but also by prosody gaze direction, and several types of gestures. Prominent nonverbal expressions of uncertainty include gaze aversion, head waggles, lip pouting, lowering eyebrows, and self-touching.
Warning: verbal expressions of uncertainty, in particular adverbs, should sometimes be interpreted as part of the semantic content of a dialogue act, rather than as a qualification of the communicative function. The following examples illustrate this:
1. I'll come around eight o'clock, probably.In these examples, probably and definitively apply to the time that is mentioned, not to the sender's certainty about his commitment to come.
Certainty and the lack thereof cannot only be indicated by verbal expressions, but also by prosody, direction of gaze, and several types of gestures. Prominent nonverbal expressions of uncertainty include gaze aversion, head waggles, lip pouting, lowerin geyebrows, and self-touching (like head scratching).
Conditionality refers to the possibility (with respect to ability and power), the necessity, or the willingness to perform an action; the qualifiers conditional and unconditional can therefore be attached to action-discussion functions. The following examples illustrate this phenomenon.
In the first example we see the conditional acceptance of an offer; in the second example a conditional request, with a conditional acceptance; in the third a conditional promise; in in the fourth two conditional acceptances of a request; and in the fifth a conditional rejection of a suggestion. The absence of expressions indicating a condition leads to expressions that signal unconditional dialogue acts, hence the default value is 'unconditional', and does not need to be marked up. Explicit expressions of `unconditionality' are hard to find, other than the adverb unconditionally, which is hardly ever used in natural dialogue.
Conditional dialogue acts can often be recognised by the use of conditional expressions such as if ... or unless, and just bu can also be part of the semantic content rather than qualifiers of the communicative function.
A particular sentiment associated with the performance of a dialogue act may be annotated if the sender indicates an emotion or an attitude concerning the semantic content or the addressee, verbally or nonverbally, or both. Nonverbal expressions of sentiment exist in abundance and in great variety, including for instance smiling (happiness), eyebrow raising (surprise), pressing lips together (angst), and sighing (sadness). Specific guidelines for sentiment annotation cannot be given here, in view of the class of sentiment qualifiers not being specified in DIT++.
A dialogue act A1 is functionally dependent on a previous dialogue act A2 (its `functional antecedent'), if its communicative function by its very nature responds to another dialogue act. This is the case for the following core communicative functions:
- Answer, Confirm, Disconfirm;Encoding a functional dependence relation means identifying the functional antecedent and linking the two dialogue acts by means of a `functionalDependence' element.
The identification of a functional antecedent is not straightforward if (a) the current dialogue act does not respond to a single dialogue act but to a combination of dialogue acts, as in the following example, or (b) responds to an implicit dialogue act.
1. U: Can you tell me what time there are trains from Harwich to York?Utterance 4 forms a functional segment with function Answer, which responds to the question formed by the dialogue acts expressed by utterances 1 and 3 together. In such a case it is recommended to mark functional dependence relations to both these dialogue acts.
Every auto- or allo-feedback act is about the processing of one or more previous dialogue segments, and therefore has a feedback relation to these segments. This is the case both for feedback acts that have a dimension-specific communicative function and for feedback acts with a general-purpose function.
Encoding a feedback dependence relation means identifying the functional segment(s) that the feedback is about, and linking the dialogue act to these segment(s) by means of a `feedbackDependence' element. For feedback acts with an Auto-Positive or Allo-Positive function the feedback is usually about the previous utterance from another participant, but sometimes the feedback is more global, and can refer to everything that happened so far in the dialogue - in such a case it is best not to annotate a feedback dependence.
Many of the relations which may occur between units in discourse such as Justification, Explanation, Cause-Effect, or Summary, and which in the linguistic literature are often called `rhetorical relations' or `discourse relations', may also occur between dialogue acts. DIT++ does not specify any particular set of such relations, and therefore does not provide detailed guidelines for their encoding. So-called `discourse markers' like also, but, because, for example, in short, so often signal such relations as Elaboration, Motivation, Justification, Cause, Exemplification, Conclusion or Summary, and they are often multifunctional. For example, a protracted turn-initial A..n..d,... may be a functional segment with the functions `Turn Take', `Stalling', `Auto-Positive', and may also be the first part of a longer functional segment expressing a dialogue act which has an Elaboration relation to a previous dialogue act. (See Petukhova & Bunt, 2009a on the multifunctionality of discourse markers; Mann & Thompson, 1988, on rhetorical relations; Hovy & Maier, 1995 more generally on discourse relations, and Prevot et al., 2011 on discourse relations in dialogue.)