DIT++ is a semantically based framework for the analysis of human and human-machine dialogue, and for annotating dialogue with information about the communicative acts ('dialogue acts') that are expressed by dialogue segments. DIT++ consists of (1) a comprehensive, application-independent multidimensional taxonomy of communicative functions which are semantically defined in terms of their information-state changing potential, (2) the definition of a set of 10 orthogonal dimensions to which a dialogue act may belong, which offers a basis for understanding the multifunctionality of utterances in dialogue, (3) the definition of various kinds of semantic and pragmatic relations between dialogue acts, and (4) the specification of a small set of 'qualifiers' that may be used to indicate a speaker's uncertainty, reservations ('conditionality'), or sentiment. The Dialogue Act Markup Language (DiAML) was designed to use the concepts of DIT++ in dialogue act annotation and in the specification of dialogue acts in online recognition, interpretation, or generation of spoken, written, or multimodal dialogue.
The DIT++ taxonomy was constructed by extending the taxonomy of Dynamic Interpretation Theory (DIT), originally developed for information dialogues (Bunt, 1989; 1994), with a number of dialogue act types from DAMSL (Allen & Core, 1997) and other annotation schemes and dialogue studies.
Release 5.1 was developed in tandem with the definition of ISO standard 24617-2:2012 (September 2012) for dialogue act annotation. This concerns in particular (1) the definitions of the communicative functions in the DIT++ taxonomy and those included in the ISO standard, which have been made identical, and (2) the definition of the DiAML markup language, which can be used both with the concepts defined in DIT++ and with those defined in the ISO standard (see the annotated examples). The DIT++ release 5.1 annotation scheme is thus fully compatible with ISO 24617-2:2012. It is in some respects more fine-grained than the ISO scheme; where the latter includes 56 communicative functions, the DIT++ scheme (release 5.1) contains 88 functions, including notably more detailed feedback functions, more functions for discourse structuring and for social aspects of interacting, and functions for contact management (a dimension that is not included in the ISO 24617-2:2012 standard).Experiences in the use of DIT++ Release 5.1 and of the ISO 24617-2 annotation scheme have inspired some improvements and extensions to DIT++ release 5.1 which are included in Release 5.2 and which are the basis of the second edition of the ISO 24617-2 standard, established in December 2020. Release 5.2 is an upward compatible revision of Release 5.1 in the sense that all annotations made according to Release 5.1 are also valid according to Release 5.2. The taxonomy of communicative functions is only extended; some other aspects have been improved. Release 5.2 includes a number of new elements that allow more accurate annotation of relations among dialogue acts. Moreover, the concept of a 'plug-in annotation scheme' has been introduced (Bunt, 2019), which allows various ways of enriching and customizing dialogue act annotation/specification. In particular, plug-in schemes are defined for (1) enriching DIT++ descriptions of dialogue acts with semantic content information; (2) introducing task- or domain-specific communicative functions; (3) annotating casual talk, for example in the opening and closing phases of a dialogue; and (4) indicating speaker emotions, importing elements from EmotionML. See 'New in Release 5.2' for a summary description of what's new in this release.
The concepts of DIT++ have been applied and evaluated in a number of annotation efforts and in the design of the ISO 24617-2 standard for dialogue act annotation. For some of its applications to annotation, see Geertzen and Bunt (2006), Petukhova and Bunt (2007), Geertzen et al. (2007), Petukhova, 2011, Fang et al. (2012), Petukhova & Bunt (2012), Bunt et al. (2019).
Another application is in the design of a dialogue manager module that is capable of generating multifunctional contributions to a dialogue; see Keizer and Bunt, 2006, Keizer and Bunt, 2007, Keizer et al., 2011, Malchanau et al., 2015 Malchanau et al., 2018; Malchanau 2019.
For the use of the DIT++ taxonomy and DIT more generally in other studies of dialogue see:
Geertzen (2009), Morante (2007), Bunt (2011), Petukhova (2011),
and the publications listed in Part 7 of this document.
(N1) < dialogueAct xml:id="da1"target="#m1" speaker="#s" addressee="#a" dimension="task" communicativeFunction="inform"/> < contentLink dialAct="#da1" content="#z"/>The annotation of semantic content is optional; the use of such a plug-in is an option, not an obligation. The use of an explicit link between the functional aspects of a dialogue act and its semantic content allows the use of alternative plug-ins for content annotation, and offers the possibility to customize the content annotation to a specific application. It also enables the specification of additional information attached to the content link, such as (un-)certainty scores and alternatives, and gives support to the management of ambiguities in the semantic content. See below for more about plug-in annotatation schemes and interfaces.
Feedback by means of expressions such as "OK", "Uh-huh", or "Really?" says something about a previous dialogue act, while feedback by means of "Tuesday?" or "John WHO?" is about a particular word or dialogue segment. DIT++ therefore allows both dialogue acts and functional segments as antecedents for feedback dependence relations. This is not always accurate, since segment-related feedback is not necessarily about a functional segment; it may be about any previous segment, functional or not, such as a single word. Reference segments are introduced in release 5.2 for more accurate markup of feedback dependences, which can also be used for OCM and PCM acts.
(N2) a. A: Have you seen Pete today? B: He didn't come in; he has the flu.
b. < dialogueAct xml:id="da1" target="#fs1" sender="#a" addressee="#b" dimension="task" communicativeFunction="propositionalQuestion"/> < dialogueAct xml:id="da2" target="#fs2" sender="#b" addressee="#a" dimension="task" communicativeFunction="answer" functionalDependence="#da1"/> < dialogueAct xml:id="da3" target="#fs3" sender="#b" addressee="#a" dimension="task" communicativeFunction="inform"/> < rhetoricalLink dact="#da3" rhetoAntecedent="#da2" rhetoRel="cause"/>
One important limitation of annotating rhetorical relations in this way is that it is not possible to indicate which argument of a relation has whhich role. For example, the annotation in (N2) merely says that a causal relation exists between the two dialogue acts, but it cannot indicate that the second argument causes the first, rather than the other way round. To overcome this limitation, the < drLink> construct has been introduced in Release 5.2, inspired by the way semantic relations in discourse are annotated in ISO 24617-8:2016 ('Semantic relations in discourse'), which allows the bottom line in (N2)b to be replaced by (N3):
(N3) < drLink arg1="#da1" arg2="#da3" rel="cause" > < argRole arg="#da2" role="result" /> < argRole arg="#da3" role="reason" /> < /drLink >
Everyday conversations such as a chat with a neighbour or with a colleague at the coffee machine often do not have such a well-delineated task as their motivation, but are aimed at a social purpose, such as establishing a pleasant atmosphere or maintaining a good relationship. Task-related dialogues often have an initial phases in which the participants are exchanging small talk before getting to a specific task, and such initial phases have often been omitted in dialogue corpora, where the initial small talk is viewed as occurring ‘before’ the ‘actual’ dialogue. An exception is the ADELE corpus of casual conversations (Gilmartin et al., 2018) in the form of textual chat dialogues. The dialogues in this corpus often have rather elaborate initial phases with sequences of greetings and discussions of each other’s health, and sometimes also an extended leavetaking phase with various kinds of greetings and well-wishing. In order to annotate the communicative functions in such phases in a satisfactory way, Release 5.2 includes several additional communicative functions in the Social Obligations Management dimension.
In any case, the use of a semantic content plug-in PLc for the host annotation scheme La requires a plug-in interface aYc, which can be defined as shown in (1): the abstract syntax introduces the content link structure as a pair consisting of a dialogue act entity structure (‘a’) and a content entity structure (‘c'); the concrete syntax specifies its XML encoding, and the semantics specifies its meaning as the application of the function Ia(a), defined by the semantics of the host annotation scheme, to the argument Ic(c), defined by the plug-in semantics. This semantics reflects the dialogue act theory underlying DIT++, according to which the semantics of a full-blown dialogue act is an update operation on information states, defined by applying the semantics of the functional part of the dialogue act to its semantic content (which is computed as the semantic interpretation of the content annotation).
(P1) aYc = 〈aASc, aCSc, aSmc〉, with:The union of these components forms a useful annotation scheme only if two important properties of the conceptual inventory of the host annotation scheme are preserved: the orthogonality of the set of dimensions and the taxonomic structure of the set of communicative functions. In other words, if a plug-in introduces a new dimension, then this should be orthogonal to the dimensions defined in the host scheme, and if it introduces additional communicative functions, then these should fit in the taxonomy of the host scheme. For semantic content plug-ins no such issues arise, but for other plug-ins they may.
Three plug-ins for semantic content are defined below with increasing richness, one where dialogue act content is described as a set of attribute values, one where events, participants, and semantic roles are distinguished, and one where natural language quantification is additionally taken into account.
(P3) a. I'd like to leave around ten in the morning (= markable m1) b. < avContent xml:id="c1" target="#m1" attribute="departureTime" value="10:00"/>According to the annotation theory that underlies the DIT++ scheme (Bunt, 2010; 2013; 2015; Pustejovsky et al., 2017) semantic annotations must have besides a concrete representation format also a format-independent abstract syntax and a semantics. Underlying the representation in (P3b) is a conceptual inventory that lists the attributes and their possible values, and the definition of an entity structure containing one or more attribute-value pairs 〈Ai, vij〉. The semantics of such an entity structure can be defined as a feature structure [Ai': vij'] or, equivalently, as the property λx. Ai'(x) = vij'. The variable 'x' in the lambda abstraction can in this domain be thought of as ranging over journeys. The syntax and semantics of such AV-entity structures define a very simple annotation language LAV, the semantics of which is a defined by:
(P4) IAV(〈Ai, vij〉) = [IAV(Ai): IAV(vij)] = [Ai': vij']To link an AV-content annotation to dialogue act annotations, the XML element < contentLink >, defined in the interface specified in (P1), can be used to obtain representations of the form (P5).
(P5) < dialogueAct xml:id="da1"target="#m1" speaker="#s" addressee="#a" dimension="task" communicativeFunction="inform"/> < avContent xml:id="c1" target="#m1" attribute="departureTime" value="10:00/>" < contentLink dialAct="#da1" content="#c1"/>The formal specification of the tripartite attribute-value content plug-in PLAV is as follows:
Note that the interface aYAV for connecting the AV plug-in with DiAML, defined in (1), introduces in the abstract syntax ‘content link structures’ which are just pairs 〈a, c〉 consisting of a dialogue act entity structure and a content entity structure. The semantic component of the interface combines the interpretation functions of the host annotation scheme and the plug-in by (P6), which says that the interpretation of the dialogue act annotation is applied (as a function) to the argument formed by the interpretation of the content annotation.
(P6) aIAV((a, c〉) = Ia(a)(IAV(c))This combination of the two interpretation functions is possible only if the interpretation function Ia of the host language is applicable to the output of the plug-in interpretation function. The interpretation function Ia makes use of elementary context update operators (see Bunt, 2014 for details) which are defined in a representation-neutral way, just stipulating that the given semantic content should be added to that part of the addressee's information state which contains information about the task that still has to be verified for consistency with other available information (the addressee's `pending semantic context'). To apply this approach in a dialogue system, the elementary update operators must be instantiated for the representation formalism of the system's information state. The semantic content of dialogue acts has to be represented in a form that fits in with that formalism, and if necessary has to be converted to it. For content expressed in the form of feature structures, as is the case for IAV, this not an obstacle. Existing DiAML implementations in dialogue systems, such as Keizer et al. (2011), Malchanau et al. (2019), and Malchanau (2019) use typed feature structures for information state representation, making the implementation of (P6) a straightforward matter.
The following more general content-plug is based on ISO standard 24617-4 for the annotation of semantic roles. The annotation scheme of this standard, a.k.a. 'SemAF-SR', marks up semantic information related to the question “Who did what to whom?”, assigning semantic roles to the participants in an event. For instance, the example sentence “The soprano sang an aria” is analysed as mentioning a singing event and would be annotated as shown in (P7b), where “sing.01" “ refers to a verb sense in VerbNet:
(P7) a. "The soprano sang an aria" Markables: m1=“The soprano", m2=“sang", m3=“an aria" b. < entity xml:id="x1" target="#m1" pred="soprano"/> < event xml:id=”e1” target=”#m2” eventFrame=”sing.01” eventualityType=”accomplishment” />SemAF-SR interprets such annotations as expressing the existence (or denied existence, in case of a clause with negative polarity) of certain states or events and participants in certain roles. For the example in (P7) the semantics can be expressed by the following DRS:< entity xml:id=”x2” target=”#m1” pred=”aria”/> < srLink event=”#e1” participant=”#x1” semRole=”theme”/>
When defining a content plug-in for information about semantic roles, the question arises whether all the information encoded in SemAF-SR annotations should be taken along in the plug-in. This issue regards in particular the reference to event frames for VerbNet verb senses. While this seems appropriate for the purposes of SemAF-SR, it would bring a level of detail to the interpretation of verbs and deverbal nouns which is not pursued for other content words; it may therefore be more appropriate to make this optional in a plug-in, allowing users to choose whether they want to use a conceptual inventory with that level of granularity or a less fine-grained one. The annotation of time and events also needs to be considered: ISO-TimeML (ISO 24617-1) uses a classification of event types that differs from that of SemAF-SR, and includes other detailed information about events that is not considered in SemAF-SR (like tense and aspect). Again, it is not obvious how much of that information would seem appropriate to take along in a plug-in for DiAML.
The simplest content plug-in for semantic roles is one that takes a minimalist approach to event classifications, and uses a simple form like < event xml:id="e2" target="#m3" pred="sing"/> rather than the more fine-grained representations of SemAF-SR or ISO-TimeML. This plug-in (`PLSR') is informally characterized by the following schema:
Role | Definition | |
---|---|---|
1. | Agent | Participant in an event who intentionally or consciously initiates an event, and who exists independently of the event. |
2. | Beneficiary | Participant in an eventuality that is advantaged or disadvantaged by the eventuality, and that exists independently of the event. |
3. | Cause | Participant in an event that initiates the event, but does not act with any intentionality or consciousness; the participant exists inde-pendently of the event. |
4. | Goal | Participant in an event that is the (non-locative, non-temporal) end point of an action; the participant exists independently of the event. |
5. | Instrument | Participant in an event that is manipulated by an agent, and with which an intentional act is performed; it exists independently of the event. |
6. | Partner | Participant in an event that is intentionally or consciously involved in carrying out the event. Participant is not the principal agent of the event, and exists independently of the event. |
7. | Patient | Participant in an event that undergoes a change of state, location or condition, is causally involved or directly affected by other participants, and exists independently of the event. |
8. | Pivot | Participant in a state that is characterised as being in a certain position or condition throughout that state, and has a major or central role or effect in that state. A pivot is more central to the state than a participant in a theme role, and exists independently of the state. |
9. | Purpose | Set of facts or circumstances that an agent wishes or intends to accomplish by performing some intentional action. |
10. | Reason | Set of facts or circumstances explaining why a state exists or an event occurs. |
11. | Result | Participant in an event that comes into existence through the event; it indicates a terminal point for the event: when that is reached, the event does not continue. |
12. | Setting | Set of (non-locative and non-temporal) facts or circumstances of the occurrence of an event or a state. |
13. | Source | Non-locative, non-temporal starting point of an event. The source exists independently of the event. |
14. | Theme | Participant in a state or an event that (i) in the case of an event, is essential to the event taking place, but does not have control over the way the event occurs and is not structurally changed by the event, and (ii) in the case of a state, is characterised as being in a certain position or condition throughout the state, and is essential to the state being in effect but not as central to the state as a participant in a pivot role. The theme of a state or event exists independently of the state or event. |
15. | Manner | The way or style of performing an action or the degree/strength of a cognitive or emotional state. |
16. | Medium | The physical setting, device or channel that allows an event to take place. |
17. | Means | Procedure for performing an action in terms of component steps, or a methodology by which an intentional act is performed by an agent. A means does not necessarily exist independently of the event. |
18. | Location | Place where an event occurs, or a state is true, or a thing exists. |
19. | Initial Location | Participant in an event that indicates the location where an event begins or a state becomes true; initial-location exists independently of the event. |
20. | Final Location | Location where an event ends or a state becomes false; final-location exists independently of the event. |
21. | Path | Intermediate location or trajectory between two locations, or in a designated space, where an event occurs. |
22. | Distance | Length or extent of space that plays a role in an eventuality. |
23. | Time | Participant that indicates an instant or a time interval during which a state exists or an event takes place. |
24. | Duration | Length or extent of time during which an event occurs or a state is true. |
25. | Initial Time | Indication of the point in time when an event begins or a state becomes true. |
26. | Final Time | Indication of a point in time when an event ends or a state ceases to be true. |
27. | Amount | Quantity of something other than time or space, or number of objects of a certain kind, which plays a role in an event or a state. |
28. | Attribute | Property that an event or state associates with one of the other participants. |
The interface for this plug-in is the same as the one defined above in (P1).
A plug-in for the semantic content of dialogue acts is more general and more powerful as it takes more aspects into account of the meanings of phrases, clauses, sentences, and other natural language structures that may express semantic content. On top of the identification of events with their time and place and participants with their respective roles, the interpretation of quantifier and modifier structures forms the most important source of semantic information. The ISO standard 24617-12 under development can be the basis of a powerful plug-in for this type of information. See Bunt et al. (2018) and Bunt (2019) for the design of an annotation scheme for quantification and modification, and Bunt (2018) for a preliminary version of a standard annotation scheme.
DIT++ release 5.2 supports the marking up of rhetorical relations in dialogue in a more fine-grained way than previous releases, but does not specify any particular set of relations to be used. A plug-in for such relations does not require the introduction of any entity structures or link structures, since these have been defined in this release of DiAML (the < drLink> element in the conrete syntax and the corresponding link structure in the abstract syntax). No specification of an interface is thus required, only the specification of a set of rhetorical relations and their argument roles. Such a plug-in still has the geneal tripartite structure, but has a very simple form: the specification of a rhetorical relation appears in three places, for example: ‘Cause’ occurs as the causal relational concept in the conceptual inventory of the abstract syntax; the string “cause” occurs as the value of an XML attribute in the vocabulary of the concrete syntax, and ‘Cause’ occurs as a binary predicate constant in the semantics.
The tripartite plug-in specified here takes the set of relations in ISO 24617-8:2015 (DR-Core) as its point of departure. The DR-Core set contains 18 core relations, to which the relation ‘Evaluation’ has been added. Table 2 lists the resulting 19 relations with their definitions, which describe their semantics in an informal way. Many other relations that are distinguished in other annotation schemes can be seen as special cases of these relations, for example ‘Explanation’ as a case of ‘Cause’, ‘Juxtaposition’ as a case of ‘Contrast’, and ‘Specification’ as a case of ‘Elaboration’.
The ISO 24617-8:2016 standard for annotating semantic relations in discourse distinguishes between ‘semantic’ and ‘pragmatic’ variants of discourse relations. This distinction is illustrated by the difference between (P11a) on the one hand and (P11b) and (P11c) on the other. Where in (P11a) having the flu is the reason for not coming in, in (P11b) beating his wife is not a reason for Jim to be an idiot, but for the speaker to say that Jim is an idiot, and in (P11c) “this” water being from yesterday is the reason for the request to give fresh water. In (P11a) the causal relation holds between the semantic contents of B’s two Inform acts (‘semantic cause’); in (P11b,c) there is a causal relation between the semantic content of the second dialogue act and the performance of the first (‘pragmatic cause’).
(P11) a. John did not come in today. He's still struggling with the flu. b. B: Jim is an idiot. He beats his wife. c. A: Could you give me a glass of fresh water please? This is from yesterday.
This distinction could be expressed in DiAML by introducing an attribute whose values represent the 'semantic'/'pragmatic' distinction, but such an extension would not be semantically interpretable unless the semantic content of dialogue acts would be available for interpreting the 'semantic' case. The distinction can be expressed directly in the presence of a semantic content plug-in by allowing the arguments of a rhetorical relation to be both dialogue act structures and semantic content structures.
In any case, a tripartite plug-in annotation scheme PLDR for discourse relations in dialogue can be simple, as follows: Abstract syntax:Relation | First argument | Second argument |
---|---|---|
Cause | reason | result |
Condition | antecedent | consequent |
NegativeCondition | negatedAntecedent | consequent |
Purpose | enablement | goal |
Manner | means | achievement |
Concession | expectationRaiser | expectationDenier |
Exception | regular | exclusion |
Substitution | disfavoredAlternative | favoredAlternative |
Exemplification | set | instance |
Elaboration | broad | specific |
Asynchrony | before | after |
Expansion | narrative | expander |
Evaluation | situation | judgement |
contrast, similarity | conjunction, disjunction, | restatement, synchrony |
Relation | Definition | |
---|---|---|
1. | Cause | The second argument provides a reason why the first argument occurs or holds true. |
2. | Condition | The first argument is an unrealized situation which, when realized, would lead to the situation that forms the second argument. |
3. | Negative Condition | The first argument is an unrealized situation which, when not realized, would lead to the situation that forms the second argument. |
4. | Purpose | The second argument is the goal or purpose of the situation that forms the first argument. |
5. | Manner | The second argument describes how the first argument comes about or occurs. |
6. | Concession | The second argument cancels or denies an expected causal relation between the first argument and the negation of the second. |
7. | Contrast | One or more differences between the two arguments are highlighted with respect to what each predicates as a whole or about some entities they mention. |
8. | Exception | The second argument indicates one or more circumstances in which the situation that forms the first argument does not hold. |
9. | Similarity | One or more similarities between the two arguments are highlighted with respect to what each predicates as a whole or about some entities they mention. |
10. | Substitution | The two arguments are alternatives, the situation of the second argument being the favored or chosen alternative. |
11. | Conjunction | The two arguments bear the same relation to some other situation evoked in the discourse. Their conjunction indicates that they both hold with respect to that situation. |
12. | Disjunction | The two arguments bear the same relation to some other situation evoked in the discourse. Their disjunction indicates that they are non-exclusive alternatives with respect to that situation. |
13. | Exemplification | The second argument is a situation that is an element of the set of situations described by the first argument. Arg1 describes a set of situations. |
14. | Elaboration | The two arguments are the same situation, but the second argument is specified in more detail. |
15. | Restatement | The two arguments are the same situation, but viewed from different perspectives. |
16. | Synchrony | The two arguments form two temporally overlapping situations. All forms of overlap are included. |
17. | Asynchrony | The first argument temporally precedes the second. |
18. | Expansion | The two arguments are distinct situations that involve some shared entities; the second argument expands a narrative of which the first argument forms part of a certain narrative and Arg1 is a part, or expanding on the setting relevant for interpreting Arg1. |
19. | Evaluation | The second argument provides an opinion on the social, esthetic, economic, or other qualities of the first argument. |
Note that this plug-in is especially powerful in combination with a plug-in for semantic content, but it can of course also be used without.
DIT++ was designed to be domain-independent, and for this reason does not include communicative functions that would be specific for a certain application domain. All its communicative functions are either general-purpose or belong to one of the dialogue control dimensions. The general-purpose functions of DiAML form a powerful battery of functions for use in any application, but still many applications could benefit from the availability of additional, domain-specific communicative functions. This is another area where plug-ins can be useful.
One important question that arises when designing a plug-in for domain-specific types of dialogue acts is how these communicative functions relate to the general-purpose functions of the host annotation scheme. In a negotiation domain, for example, one finds offers, counter-offers, accepts and rejects of counter-offers, and so on. Such offers and their various kinds of responses and continuations can be viewed as special cases of the general-purpose functions Offer and AddressOffer, and they would thus fit well within the taxonomy of the ISO standard.
According to the general structure of a plug-in, PLa = 〈Aa, CSa, Sma〉, with ASa = 〈CIa, ACa〉; CSa = 〈Va, CCa, Fa〉, and Sma = 〈M, Ia〉, a plug-in PLCF for adding certain communicative functions would have a very simple specification since no new entity structures or link structures are needed, but only the following components:
The sender of a dialogue act may expresses a certain emotion associated with the performance of the dialogue act, such as amusement, irritation, or disappointment. DIT++ in previous releases used qualifiers for this purpose, in paticular as values of the @sentiment attribute, but this assumes that an emotional state can be characterized in a one-dimensional way, through a single predicate. That may be reasonable for some use cases, but is in general too simple.
The W3C recommendation EmotionML is a flexible scheme, designed with the aim of being combined with other annotation schemes. It characterizes emotions as complex entities, including ‘emotion categories’ such as “anger”, “happiness”, or “surprise”, an intensity value (called ‘valence’), and a confidence value, as well as various alternative other ways of describing emotions, notably in terms of ‘action tendencies’, ‘appraisals’, and multiple ‘dimensions’. An emotion in EmotionML may have components of various categories; for instance, in the snippet (P13), taken from the document https://www.w3.org/TR/emotionml/, an emotion is annotated as being a form of anger with elements of sadness and fear.
(P13) < emotion category-set=”http://www.w3.org/TR/emotion-voc/xml#big6”> < category name=”sadness” value=”0.3”/> < category name=”anger” value=”0.8”/> < category name=”fear” value=”0.3”/> < /emotion>
Observing that there is no general agreement in the community, EmotionML does not provide a single repository of emotion descriptors, but gives users a choice to select a suitable emotion vocabulary in their annotations. In order to promote interoperability, EmotionML provides a number of emotion vocabularies that can be used for this purpose. The guiding principle for selecting emotion vocabularies has been to list vocabularies that are either commonly used in technological contexts, or represent current emotion models from the scientific literature. One of the best known repositories is Ekman’s ‘big six’, (Ekman, 1972), a set of basic emotions with universal facial expressions; emotions that are recognized and produced in all human cultures. Example (H41) shows how this repository or one of the others listed by EmotionML is referenced in an annotation.
EmotionML is defined only at the level of concrete syntax, so it cannot directly be used as a tripartite plug-in. However, an abstract syntax for EmotionML can be developed using the CASCADES methodology in reverse engineering mode (Bunt, 2016), and a semantics can be added for those parts of EmotionML markups that are truly semantic in nature (in contrast with e.g. confidence values).
An emotion has an experiencer and an object that the emotion is about. The emotional aspect associated with a dialogue act is a relation between the speaker, as the experiencer of the emotion, and (the semantic content of) the dialogue act as the object of the emotion. For example, in (P14) the experiencer of the emotion associated with the acceptance of the preceding offer is participant P2 and the object is the semantic content of this offer and its acceptance, viz. P2 having a cup of coffee.
(P14) a. P1: Would you like to have a cup of coffee? ( = markable m1) P2: That would be wonderful! ( = markable m2) b. < dialogueAct xml:id=”da1" target=#m1" speaker=#p1" addressee=”#p2" dimension=”social" communicativeFunction=”offer"/> < contentLink dialAct=”#da1" content=”#e1"/> < dialogueAct xml:id=”da2" target=”#m2" speaker=”#p2" addressee=”#p1" dimension=”social" communicativeFunction=”acceptOffer" funcDep=”#da1"/> < event xml:id=”e1" target=”#m2" pred=”have-coffee"/> < srLink event=”#e1" participant=”#p2" semRole=”agent"/> < contentLink dialAct=”#da2" content=”#e1"/> < emotion xml:id=”em1" target=`”#m2" category=”happiness" value=”0.8"/> < emoLink holder=”#p2" object=”#e1” emotion=”#em1"/>
For linking emotion specifications to dialogue act annotations, a plug-in interface is needed that defines the < emoLink> element used in (P14b) with its underlying abstract syntax and semantics. In the abstract syntax, an emotion link structure is a triple 〈p, s, e〉 formed by a dialogue participant ‘p’ who is the sender of a dialogue act, the semantic content ‘s’ of this dialogue act, and an emotion ‘e’. These components correspond in the concrete syntax to the values of the attributes @holder, @object, and @emotion in an < emoLink> element, as illustrated in (P14). The semantics of the emotion link structures is defined by (P15).
(P15) a+cIe((p, s, e〉) = Ie(e)(Ia(p), Ic(s))The DIT++ taxonomy forms a multidimensional system not only in the sense that it supports the assignment of multiple communicative functions to dialogue segments, but also in the sense that dimensions have a well-defined conceptual status in dialogue analysis, as different aspects of communication that may be addressed independent of each other (see Bunt, 2006). For annotation, the multidimensionality of the schema means that a functionally relevant segment of dialogue behaviour may be tagged as having more than one communicative function -- maximally one in each dimension if the tagging of implied functions is avoided. Dimensions are represented in the presentation of the taxonomy in boldface italic.
The term ‘dialogue act' is often used rather loosely in the sense of speech act used in dialogue. Indeed, the idea of interpreting communicative behaviour in terms of actions, such as questions, promises, and requests goes back to speech act theory (Austin, 1962; Searle, 1969). But where speech act theory is primarily an action-based approach to meaning within the philosophy of language, dialogue act theory is an empirically-based approach to the computational modeling of linguistic and nonverbal communicative behaviour in dialogue. Dialogue acts offer a way of characterizing the meaning of communicative behaviour in terms of update operations, to be applied to the information states of participants in the dialogue; this approach is commonly known as the ‘information-state update’ or ‘context-change’ approach -- see e.g. Bunt (1989; 2000a); Traum and Larsson (2003). For instance, when an addressee understands the utterance “Do you know what time it is?” as a question about the time, then the addressee’s information state is updated to contain (among other things) the information that the speaker does not know what time it is and would like to know that. If, by contrast, it is understood that the speaker is reproaching the addressee for being late, then the addressee’s information state is updated to include (among other things) the information that the speaker does know what time it is. Distinctions such as that between a question and a reproach concern the communicative function of a dialogue act, which is one of its two main components. The other main component is its semantic content, which describes the objects, properties, relations, situations, actions or events that the dialogue act is about. The communicative function of a dialogue act specifies how an addressee updates his information state with the information expressed in the semantic content when he understands the dialogue act.
This approach to the definition of communicative functions is strictly semantic, in contrast to approaches based on linguistic form. For example, the behaviour of a speaker who repeats something that was said by someone else may be characterised as a ‘repetition’ (which is a communicative function in some annotation schemes); however, this only says something about the form of the behaviour compared to the repeated behaviour, not about its function. A repetition often has a feedback function, as in (D1).a, but it can also have other functions, as in (D1).b, where it is used as a confirmation in response to a check question:
(D1) S: There are evening flights at seven-fifteen and eight-thirty a. C: Seven-fifteen and eight-thirty b. C: And that’s on Sunday too S: And that’s on Sunday too
A form-related requirement for introducing a communicative function is however that there are observable features of communicative (linguistic and/or nonverbal) behaviour which are indicative for that function in the context in which the behaviour occurs. This requirement puts all communicative functions on an empirical basis. Dialogue act annotation is the marking up of stretches of dialogue with information about the dialogue acts they contain. Spoken dialogues are traditionally segmented into turns, defined as stretches of communicative behaviour produced by one speaker, bounded by periods of inactivity of that speaker. Turns can be quite long and complex, and are therefore not the most useful units of behaviour to assign communicative functions to. Communicative functions can be assigned more accurately to smaller units, which are called functional segments, and which are defined as the minimal stretches of communicative behaviour that are functionally relevant.
Inherent to the notion of a dialogue act is that there is an agent who produces the dialogue act, called the ‘sender’, and one or more agents who are addressed, called the ‘addressee(s)’. Dialogue studies often focus on two-person dialogues, in which case the dialogue acts have only one addressee. Besides sender and addressee(s), there may be various types of side-participants who are present but do not or only marginally participate (see Clark, 1996).
Dialogue act annotation is often limited to assigning communicative functions to dialogue segments, which corresponds intuitively to indicating the type of communicative action that is performed. A semantically more complete characterization additionally provides information about the category of semantic content. The DAMSL annotation scheme distinguishes three categories of semantic content: Task, Task Management, and Communication, which indicate whether the semantic content of the dialogue act advances the task which underlies the dialogue, or discusses how to perform the task, or concerns the communication process. DIT++ distinguishes 10 subcategories of communication-related information, such as feedback information, turn allocation information, and speech management information. These categories of semantic content are also called ‘dimensions’.
Example (D2) illustrates the use of the key attributes of a dialogue act in the DiAML annotation of a task-related yes-no question addressed by speaker ‘a’ to addressee ‘b’, expressed by the functional segment ‘m1’:
(D2) < dialogueAct xml:id="da1" target="#m1" sender="#a" addressee="#b" dimension="task" communicativeFunction="propositionalQuestion"/>
Some types of dialogue acts are inherently dependent for their full meaning on one or more dialogue acts that occurred earlier in the dialogue. This is for example the case for answers, whose meaning is partly determined by the question that is being answered, and also for the acceptance or rejection of offers, suggestions, requests, and apologies. This is illustrated in example (D3), where the meaning of the answer in turn 3 depends on whether it is an answer to the question in turn 1 or to the one in turn 2.
(D3) 1. B: Do you know who’s coming tonight? 2. B: Which of the project members do you think will be there? 3. A: I’m expecting Jan, Alex, Claudia, and David, and maybe Olga and Andrei.
As an answer to the question in 1, A’s answer says that nobody else is expected to come than the people that are mentioned, but as an answer to the question in 2 it leaves open the possibility that other people will come, who are not members of ‘the project’. This kind of semantic dependence, which is due to the responsive character of some communicative functions, is called a functional dependence relation. Marking up this relation between a dialogue act with a responsive communicative function and its ‘antecedent’ dialogue acts allows the annotation to not just indicate e.g. that an utterance has the function of an answer, but also to indicate to which question it is an answer, as illustrated in (D4).
(D4) a. B: Which of the project members do you think will be there? A: I’m expecting Jan, Alex, Claudia, and David, and maybe Olga and Andrei. b. < dialogueAct xml:id="da1" target="#m1" sender="#b" addressee="#a" dimension="task" communicativeFunction="setQuestion"/> < dialogueAct xml:id="da2" target="#m2" sender="#a" addressee="#b" dimension="task" communicativeFunction="answer" functionalDependence=”#da1”/>
Positive and negative feedback-providing acts depend for their interpretation also on what happened earlier in the dialogue, but in a different way. They are concerned with the processing of what was said before - such as its perception or its interpretation. This is illustrated by the examples in (D5).
(D5) 1. A: The flight on Tuesday would suit me really well. B: Okay. 2. A: The flight on Tuesday would suit me really well. B: On Tuesday?
In the first example B indicates that he has correctly understood A’s remark; in the second he checks whether he heard (or remembers) correctly what A said. This relation between a positive or negative feedback act ant its ‘antecedent’ is called a feedback dependence relation.
A feedback dependence relation indicates one or more preceding dialogue acts if the feedback concerns high-level processing, such as understanding, and it indicates a dialogue segment in the case of low-level processing, such as hearing what was said. In the latter case, the feedback dependence relation was annotated according to Release 5.1 as referring to the smallest functional segment containing the segment that the feedback act is about. This way of annotating feedback dependence relations is not quite accurate, since feedback about a stretch of communicative behaviour smaller than a functional segment is not about the entire segment. For example, negative feedback that signals a problem in hearing certain words may imply positive feedback about the rest of the segment. Similarly for feedback-eliciting acts and for dialogue acts in the Own Communication Management (OCM) dimension or in the Partner Communication Management (PCM) dimension. In particular, Self-Corrections and Partner Corrections frequently refer to a single word or phrase which does not form a functional segment. To make more accurate annotation possible, Release 5.2 introduces a ‘reference segment’ as being a stretch of communicative behaviour that is the object of a feedback dependence relation and that is not a functional segment.
Rhetorical relationsDialogue acts may also be semantically and pragmatically related through rhetorical relations. These have been studied extensively for their occurrence in written discourse, and are also known as 'discourse relations'. They occur also in (spoken and multimodal), dialogue, as in the examples shown in (D6).
(D6) 1. A: It ties you on in terms of the technology and the complexity that you want 2. A: like for example voice recognition 3. A: because you might need to power a microphone and other things 4. A: so that’s one constraint there
In this example we see a sequence of four functional segments contributed by the same participant. The segments in lines 2-4 are all related to the dialogue act expressed in the first segment. Segment 2 is related to the initial statement through an Exemplification relation, segment 3 through a Cause relation, and segment 4 through a Restatement relation.
A wide diversity of sets of rhetorical relations has been proposed (see e.g. Hobbs, 1979; Mann and Thompson, 1988; Lascarides and Asher, 1991, Hovy and Maier, 1993; Prasad et al., 2008; Sanders et al., 1992), which has inspired a great deal of discussion, comparisons, and attempts to specify mappings between various sets (Benamara and Taboada, 2015; Bunt and Prasad, 2016; Schefler and Stede, 2016; Demberg et al., 2017; Sanders et al., 2018). In view of this situation, DIT++ does not propose any specific set of relations to be used, but only provides a conceptual category for which a set of relations may be specified. In Release 5.1, this provision plays out at the level of concrete DiAML syntax in the definition of an XML element called ‘< rhetoricalLink>’ which has attributes referring to two dialogue acts and an attribute for whose value a rhetorical relation can be specified. Example (N@) anove illustrates the use of this provision for indicating a causal relation between two dialogue acts.
In 2015, Prasad & Bunt defined a set of 18 ‘core’ rhetorical relations which occur in some form in most annotation schemes for rhetorical relations (see Prasad and Bunt, 2015), and a proposal for using this set for defining an ISO annotation standard. This has become 24617-8:2016, a.k.a. DR-Core. The DR-Core relations have been used in DiAML annotations as values of the @rhetoRel attribute in several annotation efforts (see e.g. Petukhova et al., 2014 and Bunt et al., 2019). The < rhetoricalLink> element was found to be rather coarse-grained, however, for the two limitations already mentioned: (1) it is not possible to indicate the roles of the arguments; and (2) it is not possible to distinguish between ‘semantic’ and a ‘pragmatic’ variants of a relation. The distinction is illustrated in (P11) avbove.
In Release 5.2 the constructs < drLink > and < argRole > are introduced in the DiAML-XML concrete syntax, and the conceptual structures that they encode are added to the DiAML abstract syntax with their semantics. Semantically the
The examples in (D7) illustrate another phenomenon that is frequently found in dialogue, namely that speakers may are uncertain about the information they provide, as in B’s utterance in (D7)a), or about their commitment to the performance of an action, as in (D7)b1. Speakers may also express a certain sentiment about the information or event that is being discussed, as in (D7.b3), or express a reservation in the form of a condition, as in (D7.b2), where an offer is conditionally accepted. For the annotation of conditions, uncertainty, and sentiment, DIT++ makes use of so-called qualifiers. Example (D7)c, annotating (D7)b2, illustrates their use.
(D7) a. A: Do you know what time the meeting starts? B: At 4 p.m. I think. b. A: Would you like to have some coffee? 1. B: Maybe later. 2. B: Only if you have it ready. 3. B: Yes please! c. < dialogueAct xml:id="da2" target="#m2" sender="#b" addressee="#a" dimension="task" communicativeFunction="acceptRequest" functionalDependence=”#da1” conditionality="conditional"/>
According to the annotation theory that underlies dialogue act annotation with the DIT++ scheme (Bunt, 2010; 2013; 2015; Pustejovsky et al., 2017) semantic annotations must have besides a concrete representation format also a format-independent abstract syntax and a semantics. The annotation theory implements the distinction made in the ISO Linguistic Annotation Framework (LAF, ISO 24612:2009) between annotations and representations. The term ‘annotation' refers to the linguistic information that is added to segments of language data, independent of the format in which the information is presented; ‘representation' refers to the format in which an annotation is rendered. This distinction is implemented in the DiAML definition by a syntax specification that defines, besides a class of XML-based representation structures, also a class of more abstract annotation structures. These specifications are called the concrete syntax and the abstract syntax, respectively. Annotation structures are set-theoretical structures. The concrete syntax defines a reference format for rendering annotation structures in XML. Alternative representation formats for DiAML annotation structures are discussed in Bunt et al. (2019). For a detailed specification of the semantics of DiAML annotation structures see Bunt (2014).
DiAML Abstract syntaxThe abstract syntax of DiAML consists of: (a) a specification of the elements from which annotation structures are built up, called a ‘conceptual inventory', and (b) a specification of the possible ways of combining these elements to form annotation structures.
The conceptual inventory of DiAML consists of sets of dialogue participants, dimensions, communicative functions, functional segments, and qualifiers.An annotation structure is a collection of entity structures and link structures. Entity structures contain semantic information about a dialogue segment; link structures describe semantic relations between entity stuctures. Entity structures are always of the general form 〈m,z〉, where ‘m’ is a markable and ‘z’ designates a structure that describes some linguistic information. Link structures are typically of the form 〈e1, e2, R〉, consisting of two entity structures and a relation.
The entity structure of central interest in DiAML is a pair 〈m,α〉 of which the linguistic information ‘α’ is a so-called ‘dialogue act structure’. A dialogue act structure contains the information that characterizes a single dialogue act. This includes minimally a specification of the sender, the addressee(s), and the communicative function. For dialogue acts with a general-purpose communicative function, the dimension of the semantic content is another component; for dialogue acts with a dimension-specific function the dimension does not need to be specified, since it is inherent in the definition of the function. General-purpose functions may additionally have one or more qualifiers. For a dialogue act which depends semantically on (the interpretation of) one or more previous dialogue segments, a sixth component is a set E of elements that the act depends on through functional or feedback dependence relations. In a setting in which other participants than the sender and the addressees should be taken into account, an additional element is a set H of ‘other participants’. A dialogue act structure is therefore in the simplest case a triple 〈S, A, fd〉, consisting of a sender S, a (set of) addressee(s) A, and a dimension-specific function fd, and in the most complex case a 7-tuple as in (19), with a general-purpose function f, a dimension d,, a set q of one or more qualifiers, and a set E of one or more dialogue units that the act depends on.
(D8) α = 〈S, A, H, f, d, q, E〉A link structure in DiAML is a triple 〈ε, E, ρ〉 consisting of an entity structure ε, a set E of one or more entity structures, and a rhetorical relation ρ, which relates the dialogue act in ε to those in E.
Concrete syntaxThe DiAML concrete syntax is defined in accordance with the CASCADES methodology for developing semantic annotation languages, described in Bunt (2013 and Pustejovsky et al. (2017). This methodology includes the notion of an ideal representation format, defined as one which is (1) ‘complete' in the sense that every annotation structure defined by the abstract syntax can be represented, and (2) ‘unambiguous' in the sense that every representation defined by the concrete syntax represents one and only one annotation structure defined by the abstract syntax. Since the semantics of DiAML is defined for the structures defined by the abstract syntax, any two representation formats which are ‘ideal' in this sense are semantically equivalent, and every representation in one such format can be converted by a meaning-preserving mapping into any other such format. The DiAML concrete syntax specifies a reference representation format based on XML, often referred to as 'DiAML-XML'. This specification lists names of XML tags, attributes, and values corresponding to the various ingredients in the conceptual inventory, and defines the possible ways of combining these elements in XML structures. In particular, XML elements are defined for entity structures and link structures.
Entity structures for dialogue acts are represented by a DiAml-XML element called < dialogueAct>, which has the following attributes:
Link structures are represented either by the DiAML-XML element < rhetoricalLink> or by the element < drLink> (annotators can choose either). The < rhetoricalLink> element has the following attributes:
Example (20c-d), shows the abstract annotation structure and its DiAML-XML representation of the dialogue fragment in (20a), segmented as shown in (20b).
(20a) P1: What time does the next train to Utrecht leave? P2: The next train to Utrecht leaves I think at 8:32.
Annotations may be attached to primary dialogue data in a variety of ways; they may be attached directly to stretches of speech, defined by temporal begin- and end points, or to structures at lower levels of description, such as the output of a tokenizer. Here it is assumed that functional segments are identified at another level of XML representation. P2's utterance is segmented into two overlapping functional segments: m2 in the Auto-Feedback dimension (reflecting that the repetition of a large part of an utterance signals positive feedback on understanding it) and m3 in the Task dimension.. Following the guidelines of the Text Encoding Initiative (TEI P5, 2010), the prefix '#' is used to indicate that the prefixed value is identified either in the metadata of the primary data or in another layer of annotation, or elsewhere within the same representation. Note that the abstract annotation structure in (D9c) is a set of three elements, corresponding to the three dialogue acts in this fragment, where the second and the third element both have the first element embedded, indicating their dependence on the first dialogue act.
(D9b) Segmentation of the exchange in (D9a): m1 = What time does the next train to Utrecht leave? (Task dimension) m2 = The next train to Utrecht leaves (Auto-Feedback dimension) m3 = “The next train to Utrecht leaves I think at 8:32.“ (Task dimension). (D9c) Annotation structure according to DiAML abstract syntax: {〈m1,〈p1,p2,setQuestion,Task〉, 〈m2,〈p2,p1,autoPositive,{〈m1,〈p1,p2,setQuestion,Task〉〉}〉, 〈m3,〈p2,p1,aswer,Task,{uncertain},{〈m1,〈p1,p2,setQuestion,Task〉〉}〉〉} (D9c) DiAML-XML annotation representation: < diaml xmlns:"http://www.iso.org/diaml/"> < dialogueAct xml:id="da1" target="#m1" sender="#p1" addressee="#p2" communicativeFunction="setQuestion" dimension="task"/> < dialogueAct xml:id="da2" target="#m2" sender="#p2" addressee="#p1" communicativeFunction="autoPositive" feedbackDependence="#da1"/> < dialogueAct xml:id="da3" target="#m3" sender="#p2" addressee="#p1" communicativeFunction="answer" certainty="uncertain" dimension="task" functionalDependence="#da1"/> < /diaml>Semantics
DiAML annotation structures have a semantics in terms of information-state updates. The most important kind of structure defined by the DiAML abstract syntax, the dialogue act structure, is a functional characterization of a dialogue act. It does not correspond to a complete dialogue act, since it does not include the semantic content (but only a semantic content category, a ‘dimension’). The semantics of a complete dialogue act is obtained by combining the interpretation of a dialogue act structure with a semantic content. This is accomplished by applying the interpretation Ia(〈s,α〉) of an entity structure which contains a dialogue act structure α, to the semantic content κ(s) of the functional segment that expresses the dialogue act. The result is an information state update operation as shown in (D10) for a dialogue act that has no functional dependences to other dialogue acts.
(D10) Ia(〈s,α〉) = Ia(α)(κ(s))Following the ISO practice of reviewing its standards every five years, the first edition ISO 24617-2:12 was examined in 2017-2018 for the need of revision. At a meeting in September 2017 it was concluded that some minor revisions would be desirable, as well as some extensions. These were discussed in a meeting of users of the first edition in Tilburg, April 2018, including Pierere Andre, Shammur Chowdhury, Emer Gilmartin, Simon Keizer, Andrei Malchanau, Catherine Pelachaud, Volha Petukhova, Laurent Prévot, Mariet Theune, Kars Wijnhoven and Harry Bunt, and at the ISA-14 workshop in Santa Fe, New Mexico, August 2018.
Additionally, possibilities for extending the standard were discussed at the ISA-15 workshop on Gothenburg, May 2019, where the notion of 3-part layered plug-ins was discussed, which was introduced in DIT++ release 5.2. A proposal for a revised, second edition of the standard was submitted to the ISO organization in 2019 and was approved in an international ballot in February 2020 -- see the ISO/DIS document. The following papers are about the revision of the first edition of the standard, discussing limitations and proposing improvements and extensions: