Annotation guidelines for applying DIT dialogue act tags

Dialogue act annotation is about indicating the kind of intention that the speaker had; what kind of thing was he trying to achieve? This is what the particpants in a dialogue are trying to establish when they interpret each other's communicative behaviour. The following general advice for dialogue act annotators derives from this.

1. Do as an addressee would do.
When marking up a functional segment, put yourself in the position of the participant(s) at whom the utterance was addressed, and imagine that you try to understand what the speaker is trying to achieve. Why does he say what he says? What are the speaker's purposes in using this utterance? What assumptions does the speaker express about the addressee? Answering such questions should guide you in deciding which tags to assign, regardless of how exactly the speaker has expressed himself. Use all the information that you could have if you were the actual addressee, and like the addressee, try to interpret the speaker's communicative behaviour.

2. Think functionally, not formally.
The linguistic form of an utterance often provides vital clues for choosing an annotation tag, but such clues may also be misleading; in chosing your annotation tags you should of course use the linguistic clues to your advantage, but don't let them fool you - the true question is not what the speaker says but what he means.
For example, Set Questions are questions where the speaker wants to know which elements of a certain domain have a certain property. In English, such questions often contain a word beginning with "wh", such as which as in Which books did you read on your holidays? or where in Where do your parents live?. But in other languages this is not the case. Moreover, not all English sentences of this form express a Set Question: Why don't you go ahead is for instance typically a Suggestion rather than a question.
Similarly, Propositional Questions are questions where the speaker wants to know whether a certain statement is true or false. Such questions are typically expressed by interrogative sentences such as Is The Hague the capital of the Netherlands? or Do you like peanut butter? But not all sentences of this form express a Propositional Question; for example, Do you know what time it is? functions most often as in indirect way of requesting to tell the time, Would you like some coffee? is an Offer; and Shall we go? is a Suggestion.

3. Be specific.
Among the communicative functions that you can choose from, there are differences in specificity, corresponding with their relative positions in hierarchical subsystems. For instance, a Check Question is more specific than a Propositional Question, in that it additionally carries the expectation that the answer will be positive. Similarly, a Confirm is more specific than an Answer, in that it carries the additional assumption that the addressee expects the answer to be positive.
In general, try all the time to be as specific as you can. But if you're in serious doubt about whether to choose a more or a less specific function, and you don't really have evidence for chosing the more specific one, then use the less specific function tag that subsumes the more specific one.

4. On indirect speech acts: "Code indirect speech acts just like direct ones."
Standard speech act theory regards indirect speech acts, such as indirect questions, as just an indirect form of the same illocutionary acts. By contrast, the DIT++ taxonomy incorporates the idea that indirect dialogue acts signal subtly different packages of beliefs and intentions than direct ones. For example, the direct question What time is it? carries the assumption that the addressee knows what time it is, whereas the indirect question Do you know what time it is? does not carry that assumption (it does at least not express that assumption; in fact it questions it).

5. On implicit functions: "Do not code implicit communicative functions, that can be deduced from functions that you have already assigned."
Implicit communicative functions occur in particular for positive feedback. For example, someone answering a question may be assumed to (believe to) have understood the question. So any time you annotate an utterance as an answer (of some sort), you might consider annotating it also as providing positive feedback on the interpretation of the question that is answered. Don't! It would be redundant.
Notice also that the definition of a positive (auto-) feedback act concerning interpretation stipulates that the speaker wants the addressee to know that he (speaker) has understood the question. A speaker who answers a question does not so much want to tell the addressee that his question was understood -- that's just a side-effect of giving an answer, that no speaker can avoid.
Similarly for reacting to an offer, a request, a suggestion, etc.

6. Guidelines for the annotation of feedback functions.
Negative feedback, where the speaker wants to indicate that there was a problem in processing a dialogue utterance, is always explicit and as such mostly easy to annotate.

6.1 Implicit and explicit positive feedback.
Positive feedback is sometimes given explicitly, and very often implicitly.
Examples of explicit positive auto-feedback are the following utterances by B, where he repeats part of the question by A:

What time does the KLM flight from Jakarta on Friday, October 13 arrive?

The KLM flight from Jakarta on Friday, October 13 has scheduled arrival time 08.50

The flight from Jakarta on Friday has scheduled arrival time 08.50

The KLM flight from Jakarta on October 13 has scheduled arrival time 08.50

The flight from on October 13 has scheduled arrival time 08.50

At 08.50

Levels of feedback.

pays attention

perceives

understands

evaluates

'executes'

For positive feedback functions a higher-level function is more specific than the lower-level functions.

negative feedback at a lower level implies negative feedback at higher levels.

Guideline 6: When assigning a feedback function, choose the most specific level of feedback in the case of positive feedback that you feel to be appropriate, and choose the least specific level in the case of negative feedback.

7. Guidelines for the annotation of Interaction Management functions.
7.1 Turn Management.
General guideline: "Code Turn Management functions only when these are not just implied."
In a spoken dialogue, the participants take turns to speak. (Their nonverbal behaviour is not organised in turns; both participants use facial expressions and gestures more or less all the time.) A turn, that is a stretch of speech by one of the participants, in general consists of smaller parts that have a meaning as a dialogue act; these parts we call "utterances". Turn Management acts are the actions that participants perform in order to manage the allocation of the speaker role. These acts are subdivided into acts for taking the turn (utterance-initial acts) and those for keeping the turn or giving it away (utterance-final acts). Usually only the first utterance in a turn has an utterance-initial function and only the last an utterance-final one. The non-final utterances in a turn do not have an utterance-final function, except when the speaker signals (for example by using a rising utterance-final intonation) that the utterance is not going to be the last one in the turn, that he wants to continue. In that case the utterance has a Turn Keeping function. Except for the first one, the utterances in the turn do not have an utterance-initial function; the speaker does not have to perform a separate act in order to continue; all he has to do is to continue speaking.

When a speaker accepts a turn that the addressee has assigned to him through a Turn Assign act, the utterance should be annotated as having the utterance-initial function Turn Accept only when the speaker performs a separate act for the purpose of accepting the turn, so don't code this when the turn is accepted implicitly by simply starting to speak.

Similarly, an utterance should be annotated as having the utterance-initial function Turn Take only if the speaker performs a separate act to that effect. If he just goes ahead and makes a contribution to the dialogue, without first signalling his intention to do so, then the utterance should not be marked with an utterance-initial Turn Management function.

The verbal as well as nonverbal activities that a speaker performs to seize the turn should be marked as Turn Grabbing, but the utterance that follows after he has seized the turn should not be marked as having an utterance-initial Turn Management function.

7.2 Time Management.
When a speaker is buying time, using fillers such as Well,..; Let's see,... , then the utterance should be annotated as having the Stalling function in the Time Management dimension. There may be several reasons why a speaker wants to have more time; it may be that the speaker has trouble completing his current utterance, or that he is interrupted by some urgent event that requires his attention before he can continue the dialogue. but it may also be that he needs some time to find some information (for instance, for answering a question). So when you encounter a Stalling act, you may well pay attention to the reason why the speaker is stalling. (For instance, Stalling often goes hand in hand with turn acceptance or turn keeping.) However, don't speculate; only code additional functions for which you have evidence.

7.3 Topic Management.
During a dialogue, the topic is often changed implicitly, simply by talking about a new topic. This happens especially if the new topic is closely related to the previous one, for instance by being a subtopic of the previous topic, or by both being a subtopic of a more general topic. Implicit topic management should not be encoded; it would be redundant. Topic Management functions should be annotated only if the speaker explicitly introduces or closes a topic, or signals his intention to do so.

7.4 Contact Management.
The management of contact in the sense of both partners being ready to send and receive messages to and from each other, is important especially in other than face-to-face situations, such as telephone conversations, video-conferencing, and internet chatting.
Note that in many languages expressions used for establishing contact can often be used for other purposes as well, for example for greeting (Hello!). When annotating a dialogue where this happens, the utterance in question should be marked as having both a Contact Management function and a Social Obligation Management function.

7.5 Own Communication Management.
Own Communication Management (OCM) functions should be coded whenever a speaker signals that he made a speech error and/or wants to edit what he is saying. Since this typically requires some extra effort and time, OCM acts often go hand in hand with acts whose function is to win time, such as hesitations (Ehm...), which have a Stalling function. See also 7.2.

7.6 Partner Communication Management.
Partner Communication Management (PCM) functions should be coded whenever a speaker signals that he believes the addressee made a speech error or has difficulty in completing an utterance, for instance being unable to recall a name or to find the right words to express something. The use of dimension-specific PCM functions for this purpose is typically only possible by interrupting the dialogue partner or in immediate response to a partner utterance.

7.7 Dialogue Structuring.
These functions should be coded only when the speaker explicitly signals something about his intention to open or close the dialogue, or to continue in a particular way.

Across the board, the following guideline applies to the encoding of Interaction Management functions:
Guideline 7: "Code only explicit Interaction Management functions."

8. Guidelines for annotating Social Obligation Management (SOM) functions.
Utterances that serve a 'social' purpose such as greetings, thanks, and apologies can often be used for other purposes as well. Greetings like Hello!, for example, can be used also for establishing contact (Contact Management function) and/or for opening a dialogue (a Dialogue Structuring function). Also, an expression of thanks can be used to signal that the speaker wants to soon end the dialogue (Dialogue Structuring function PRE-CLOSING), and can also be used for overall positive feedback. In such cases, the utterances should be coded with the appropriate functions in all these dimensions.

Guideline 8: "When coding an utterance as having a SOM function, look out for additional functions in other dimensions."

Note on segmentation.

The segmentation of a dialogue into utterances may present several difficulties.

First, if you're working form a transcription of a spoken dialogue, the segmentation in the transcription is not necessarily perfect. You may run into cases where you would prefer the utterance to be segmented as a sequence of parts that each have a functional interpretation. In such a case it is best to assign the various tags that you would prefer to assign to the parts to the utterance as a whole. Or conversely, it may also happen that a turn has been segmented into certain parts, where you would want to annotate the longer utterance formed by these parts together. In such a case it is recommended to annotate all these parts with the same tags.

Second, an utterance may be self-interrupted by a part that has a different communicative function, as in the following example: When, I mean what time, does the train to ehm,... Viareggio leave? Here the utterance as a whole is a SetQuestion; it includes a Self-Correction (I mean what time) and a Stalling utterance (ehm). In such cases, again, it is best to assign the tags for the intervening parts of the utterance to the utterance as a whole.

Third, it may happen that a dialogue act corresponds to (parts of) more than one turn, as in the following example, where the utterances in turns 1 and 3 together form a SetAnswer::

There are two flights early in the morning, at 7.45 and at 8.15,..

Yes

and two more in the evening, at 7.15 and at 8.30.

Harry Bunt, November, 2007

<harry.bunt@uvt.nl>

Harry Bunt, November 10, 2006