Information Structure: Linguistic, Cognitive, and Processing Approaches

doi:10.1002/wcs.1234

. Author manuscript; available in PMC 2015 Jul 4.

Published in final edited form as:

Wiley Interdiscip Rev Cogn Sci. 2013 July/August; 4(4): 403–413.

doi: 10.1002/wcs.1234

PMCID: PMC4491328

NIHMSID: NIHMS447770

Information Structure: Linguistic, Cognitive, and Processing Approaches

Jennifer E. Arnold, Elsi Kaiser, Jason M. Kahn, and Lucy Kyoungsook Kim

Author information ► Copyright and License information ►

The publisher's final edited version of this article is available at Wiley Interdiscip Rev Cogn Sci

See other articles in PMC that cite the published article.

Abstract

Language form varies as a result of the information being communicated. Some of the ways in which it varies include word order, referential form, morphological marking, and prosody. The relevant categories of information include the way a word or its referent have been used in context, for example whether a particular referent has been previously mentioned or not, and whether it plays a topical role in the current utterance or discourse. We first provide a broad review of linguistic phenomena that are sensitive to information structure. We then discuss several theoretical approaches to explaining information structure: information status as a part of the grammar; information status as a representation of the speaker’s and listener’s knowledge of common ground and/or the knowledge state of other discourse participants; and the optimal systems approach. These disparate approaches reflect the fact that there is little consensus in the field about precisely which information status categories are relevant, or how they should be represented. We consider possibilities for future work to bring these lines of work together in explicit psycholinguistic models of how people encode information status and use it for language production and comprehension.

Go to:

WHAT IS INFORMATION STRUCTURE?

People talk for a reason. They want to share news, connect with others, inform, amuse, or cause things to happen. Human languages are organized in ways that reflect the content and purpose of utterances – that is, the information that is contained in the words and structures that make up sentences. This organization is called information structure[1,2] or information packaging.[3] This article reviews how information structure constrains linguistic form, that is, the way people say things.

Information structure helps explain why people say things in different ways. Speakers constantly make choices about how to phrase their utterances. For example, a speaker might say The aardvark chased the squirrel, The squirrel was chased by the aardvark, or What was chased by the aardvark was the squirrel. The squirrel may be referred to with a lengthy phrase (The furry-tailed creature who stole my crackers) or simply the pronoun it. While these variations could describe the same event, they are pragmatically felicitous (i.e. appropriate) in different contexts.

Language scholars agree that linguistic form varies as a function of informational considerations, including what the speaker is attending to, what the speaker wishes the addressee(s) to focus on, what is assumed to be already known, what is considered most important, or what is treated as background information. Yet the definition of information structure is notoriously variable across researchers and topics. Our review reflects this heterogeneity, and reports the definitions of information structure that are important for each phenomenon that we discuss. Nevertheless, two general approaches to information structure emerge. Many linguistic choices reflect a distinction between information that is given (i.e. previously known or discussed), and that which is new.[4] Other choices seem to reflect the distinction between the topic (i.e. information that is backgrounded or assumed) and the focus (i.e. that which is highlighted or focused). These distinctions establish the information status of a word or referent in the discourse.

In the first section of this paper, we provide an overview of what information structure is, and how it relates to four linguistic phenomena: 1) referential form, 2) morphology, 3) word order, and 4) prosody. In the second section, we consider how it relates to major theories about language structure, use, and processing. We then consider potential psychological mechanisms for representing information structure.

Go to:

HOW INFORMATION STRUCTURE SHAPES LANGUAGE

Reference

Information structure has a strong effect on how people refer to entities in the world, including both introducing new entities into a discourse and referring back to already-mentioned entities. This can affect multiple dimensions, including definiteness, pronoun use, and modification.

Many languages, including English, use different expressions for definite and indefinite information. For example, if the speaker has just been talking to someone about a particular dog, the speaker can refer to it with the definite expression the dog or perhaps even the pronoun it. However, if the dog is mentioned in the conversation for the first time, the speaker may use the indefinite expression a dog. In English, the definite article ‘the’ is traditionally regarded as indicating that the noun is specific and familiar to both the speaker and the hearer, by virtue of having already been mentioned in the discourse.[5,6]

However, the effects of information structure on reference are modulated by real-world knowledge and inferences. For example, definites are not restricted to cases where the referent is given. Consider a sentence such as I went to a wedding and the bride wore white, but unfortunately a guest spilled wine on her.[7] Here, a wedding is indefinite and being mentioned for the first time, the bride is definite although being mentioned for the first time (i.e. a novel definite), and a guest is indefinite and being mentioned for the first time. Novel definites occur with entities that are familiar to all and known to be unique (e.g. the moon, the sky), as well as with unique entities whose existence can be inferred from mentioned entities (e.g., we can infer the bride from the wedding; Prince[4] uses the term ‘inferrable’). Due to this inference process, some novel definites can result in slowed comprehension.[8]

Information structure also guides the speaker’s selection of nouns, pronouns and other referring expressions. After mentioning a great new book, the speaker will probably use the pronoun ‘it’ to refer to the book in the immediately subsequent utterances. Use of pronouns provides an efficient, shorthand way of referring to already-mentioned, prominent referents and allows speakers to avoid excessive repetition. In fact, using a name when a pronoun would be sufficient has been shown to result in processing difficulties, at least under certain circumstances.[9] By contrast, introduction of a new referent requires fuller expressions, like a description (the donkey), possibly a modification (the scared donkey), or a name (Sylvester).

Many researchers agree that the choice of expression is determined by the salience, or accessibility of a referent in context. The more-reduced referring expressions (e.g. pronouns) are used to refer to more prominent/salient entities – i.e. those that are more activated or accessible in people’s minds at that point in the discourse – and fuller referring expressions (e.g. nouns) are used for entities that are less salient.[10,11] However, the definition of salience/accessibility is complex. Intuitively, referents become accessible when they are topical in the recent discourse – for example, when they have been recently mentioned, especially when they have been mentioned in syntactically prominent positions like the subject position.[5, 12, 13] Yet this effect is modulated by the grammatical position of the referring expression: pronouns in subject and object position tend to be interpreted as referring to previously-mentioned entities in the parallel syntactic position (e.g. preceding subject or object).[14] The interpretation of pronouns is also guided by the plausibility of potential referents, which is often connected to their thematic roles. One such effect is the implicit causality of an event, e.g. in The parrot blamed the tiger, because he…., comprehenders expect the pronoun to refer to the tiger.[15] Implicit causality and other effects of verb semantics are modulated by the coherence relation between the two clauses, for example whether the second clause communicates the cause (“…because…”) or something else.[52] It also appears that accessibility is not a single dimension, in that different kinds of referring expressions seem to be sensitive to different kinds of information. For example, Finnish is a language with flexible word order where humans can be referred to with demonstrative pronouns (‘this’) or personal pronouns (‘s/he’). In Finnish, demonstrative pronouns tend to be coreferential with post-verbal arguments (subjects or objects), which tend to be discourse-new. By contrast, personal pronouns exhibit a strong preference to be coreferential with syntactic subjects, regardless of their given/new status or sentence position.[16,17]

Morphological marking of information structure

Noun morphology

Many languages use morphological marking on nouns to indicate grammatical role. For example, in Japanese and Korean, subjects have the nominative marker –ga (Japanese) and –i/–ka (Korean), and direct objects are marked with accusative (–o in Japanese and – (l)ul in Korean). These languages also have an information-status marker that indicates the topic, namely – wa in Japanese and –(n)un in Korean,[18,19] which can occur on either subjects or objects. Both Japanese and Korean topic markers can occur on the topical entity, i.e. what the sentence is about. When the whole sentence is new information (e.g. the answer to “What happened?”), only the nominative marker is felicitous on the subject in both languages (ex. 1a, # denotes infelicity). In contrast, in a context where one entity is topical, use of the topic marker on that entity is more natural (ex. 1b, ? indicates that the usage is awkward). Both –wa and –(n)un can also have more nuanced interpretations (e.g. can be used to mark contrastive topics),[20] depending on their position and the information-structural properties of the rest of the sentence.[21]

(1a)
What happened?

{Sumi-ka / #Sumi-nun} apa [Korean]
{Sumi-NOM / #Sumi-TOP} sick
‘Sumi is sick.’ (with NOM) or ‘As for Sumi, she’s sick.’ (with TOP)
View it in a separate window
(1b)
What happened to Sumi?/Why didn’t Sumi come?’

{Sumi-nun /?Sumi-ka} apa [Korean]
{Sumi-TOP/ Sumi-NOM} sick
View it in a separate window

Other languages, such as the Mayan language Tzotzil, also use morphological means to mark topics. Topic phrases in Tzotzil begin with the particle a and end with the enclitic –e.[22] There are also languages that use morphology to mark the focused, new-information elements rather than topical elements (e.g. in West African language groups such Gur, Kwa, and Chadic).[23] Even languages that are not commonly thought of as having morphological topic markers show indications of morphology being sensitive to information structure. For example, in Russian, the object of a negative sentence (e.g. ‘letter’) is marked with accusative case when the letter is known to exist (‘He did not receive letter-ACC’ means he did not receive the letter), and with genitive case when the existence of the letter is not known/not presupposed (‘He did not receive letter-GEN’ means he did not receive any letter).[24] Related patterns exist in Finnish.[25]

Verb morphology

Some languages mark the information status of arguments with verbal morphology, in what is called an Inverse system. On transitive verbs, verbal inflections indicate which argument is ‘proximate’ (i.e., topical and/or given), and which is ‘obviative’ (less topical). This is similar to the function of the passive in English, in that the inverse verbal morphology indicates that Actor argument is less topical than the other argument. For example, in the language of the Mapuche people of Chile, ‘I’ and ‘you’ are assumed to be more topical than 3rd person arguments, so the sentence meaning She saw me is required to take the Inverse morphology, resulting in a sentence more like I was seen by her.[26] Inverse systems have also been reported for Algonquian, Cherokee, and other languages.[27]

Other focus markers

The effects of information structure are also evident in the use of focus particles such as only and even.[28] While some focus particles are separate words (e.g. English only, too, German nur ‘only’, auch ‘too’), they can also be clitics that attach to other constituents (e.g. Finnish –kin ‘too’, Japanese -mo ‘too’). The effect that these expressions have on the meaning of a sentence crucially depends on the information-structural properties of the sentence: Consider a sentence with only, such as John only saw the dog. If saw is the new information, then the sentence means that John only SAW the dog, but didn’t pet it or walk it. But if the dog is the new information, the sentence means that John only saw the DOG, and not anything else (see also the section on prosody, below).

Word order variation

The effects of information structure extend to variation in word order or constituent order. For example, a single event might be described in numerous ways, as shown in (2). (See [31] for more discussion of different constructions in English.)

(2)
- a
  Active: The cat swiped the dog on the nose.
- b
  Passive: The dog was swiped on the nose by the cat.
- c
  Heavy-NP-shifted: The cat swiped on the nose the dog that had frightened it.
- d
  Topicalization: The DOG the cat swiped on the nose, while the ferret got away.
- d
  Prepositional Dative: The cat gave a warning to the dog.
- e
  Double object Dative: The cat gave the dog a warning.
- f
  Clefting: It was the dog that the cat swiped on the nose

It is widely argued that a function of word order variation is to mark information structure, following the broad generalization that given or more accessible information precedes new or less accessible information.[1,6,29]. A related effect is the tendency to put long and complex phrases later in the utterance, and relatively shorter ones earlier (ex. 2c).[30] These two patterns are not independent, because short phrases tend to refer to given and topical information. Nevertheless, there is evidence that phrase complexity and information structure have independent effects on word order.[47]

The examples in (2) come from English, which has relatively limited word order variation, and most of the variation comes from non-canonical word orders.[31] Many other languages – including Finnish, Japanese, Korean, German, Turkish, Mayan languages and West African languages – allow even freer variation of word order. Similar to English, word order reflects information packaging, generally following a given-new order. For example, in Finnish, subjects canonically precede objects, but objects can occur before subjects when they have already been mentioned in the preceding discourse (ex. 3)[32]. Word order can also be used to mark focus similar to clefts in English (e.g. Finnish OSV and SOV order, ex. 3(c)).

(3)
1. What did Esa read?
  
  Esa luki kirjan (Finnish: SVO)
  Esa-NOM read book-ACC
  ‘Esa read a book.’
  View it in a separate window
2. Who read the book?
  
  Kirjan luki Esa (Finnish: OVS)
  Book-ACC luki Esa-NOM
  ‘Esa read the book.’ / ‘The book was read by Esa.’
  View it in a separate window
3. Kirjan Esa luki (Finnish: OSV)
  Book-ACC Esa-NOM read
  ‘It was the book that Esa read.’
  View it in a separate window

Prosody and Intonation

In languages like English, information structure is reflected in the prosody of speech. Prosody includes syllable stress and intonational phrasing, and the rhythmic structure of an utterance. A subpart of prosody, intonation, operates independently of rhythmic prosody, and marks the information or focus structure of a sentence. The intonational structure of an utterance determines which words receive accents, i.e. which words sound more acoustically prominent. Accents are commonly realized with pitch excursions, and greater duration and amplitude. Intonational accent makes the difference between the two otherwise identical sentences shown in (4), where capitals denote accented words.

(4a)
She had a pet RAT.
(4b)
She had a PET rat.

Accenting signifies information status categories such as focus, contrastiveness, and givenness/newness.[33] Focus refers to the marking of constituents in an utterance that constitute news, or contribute to the speaker’s conversational goals.[34] Example (4) shows how focus can differentially highlight information in an utterance. (4a) appropriately answers a question like “Did she have any pets?” while (4b) might occur after “Did she have any rats?” Focus need not refer to explicit questions.

Accents can also mark the contrastive element of a set of entities,[35] as in (5):

(5a)
No, not the GREEN lizard, the BROWN one.
(5b)
No, not the green LIZARD, the green SNAKE.

In 5a, the speaker refers to the brown lizard particularly as contrasted with the green, focusing on the color. In 5b, the accent instead marks the contrast between two comparable animals, focusing on the different reptiles. Focus and contrast often overlap, as in this example, where the accent reflects informational focus on one piece of contrastive information.

Finally, accent often marks the difference between given and new information, which tend to be deaccented and accented respectively, as in (6):

(6)
We had a FERRET before we bought our CORGI. The ferret (GIVEN) was surprisingly friendly. Then we bought a CAT (NEW).

As with contrast, focus plays a core role in the relationship between accent and givenness/newness.[36]

There is substantial empirical evidence that speakers and listeners are sensitive to the functions of intonation. Speakers modulate prosody based on the information status of their words, using acoustic reduction (i.e. shorter, unaccented, and less intelligible pronunciations) for previously-mentioned words or entities,[37]. Similarly, listeners are faster to interpret references to given information if the word is unaccented.[38] However, the precise acoustics of accenting and deaccenting are not yet fully understood, and further research is needed to understand how speech reflects the linguistic categories of information status.

Go to:

THEORETICAL APPROACHES TO EXPLAINING INFORMATION STRUCTURAL EFFECTS IN LANGUAGE

As described above, linguistic form is highly sensitive to information status. Why does this occur? Researchers have offered numerous types of explanations, from pragmatic rules about linguistic form, to cognitive mechanisms underlying language use. Here we review a number of these approaches.

The first section (referred to here as the linguistic system approach) describes theories that focus on the nature of the information itself. One subset of this research tradition focuses on the role that information plays within a particular discourse or sentence. Is it topical? Is the speaker choosing to highlight that information? Another subset focuses on whether the knowledge is known or familiar. This knowledge state could be construed as the speaker’s knowledge, listener’s knowledge, or both.

The second section (termed here the social/communicative approach) describes two very different theoretical approaches that consider how linguistic form variation serves the goal of communication. The common ground approach examines how discourse participants keep track of both their own knowledge state and that of their interlocutors, and how this guides language use. The optimal systems approach also focuses on effective communication, but instead accounts for a very different dimension of information – i.e., the ability for linguistic forms to carry more or less information, quantitatively speaking.

Information status as part of the linguistic system

The most widely accepted notion of information structure is as a set of rules or constraints that determine linguistic form, in whole or part. Under this view, a key aim of research in this area is to identify which information-structural categories are relevant and how they influence choice of linguistic form. This encompasses formal approaches, in which information structure is either part of the grammar, or at the grammar/pragmatics interface, as well as functional approaches in which cognitive representations guide linguistic form.

Categorical approaches

One tradition has described how linguistic systems are constrained by binary (or sometimes three-way) distinctions in information status. Broadly speaking, this approach has received considerable attention within theoretical linguistics, where researchers argue for different types of information-structural divisions (e.g. topic-comment;[39,40] topic-focus;[41] focuspresupposition;[ 42, 43] rheme-theme;[1] open proposition-focus;[44] see also [45] for a tripartite division). These approaches are united by the insight that one part of every utterance connects to something that the listener already knows, and another part provides new information about this known entity or event. For example, consider the discourse The little brown worm wiggled under the lettuce. A few moments later, he emerged from the ground for his dinner. The pronoun he refers to the worm, which has been previously mentioned and can be considered the topic of this discourse fragment. The new information [emerged from the ground] can be considered the focus. The information-structural divisions in sentences can also be seen with clefts, e.g. It was arugula that he ate. Here, ‘arugula’ is new, focused information while [he ate something] is what the sentence is about, which here can be regarded as a presupposition or open proposition. Thus, a key part of comprehension is that hearers need to identify what the topic is (i.e. intuitively, what the sentence is about), and to add the new information about the topic to their mental discourse model.

Gradient representations of information status

While theoretical linguistics has mostly favored categorical descriptions of information status categories (see section below on future directions), the functional linguistics literature has proposed gradient descriptions of information status. This view seems to be required by phenomena like variations in referential expressions, which fall along a hierarchy of specificity, ranging from unstressed pronouns to highly specific expressions like that squirrel that got into the screen porch and stole our crackers last year (see section above on reference). This variation has been proposed to result from the referent’s status along a continuum such as salience, prominence, or accessibility.[3, 10, 11,12,13] Givon[46] characterizes information along a continuum of topicality that is very similar to what other researchers call accessibility. Note that some researchers use the term focus or focus of attention to identify the element that is most salient in the discourse; but this use of the term has more in common with the linguistic term “topic” than “focus”. Recency of mention is assumed to make information salient, as well as prominent syntactic positions, like subject or topic position. Representations of accessibility also account for preferences in word order[47] and accenting.[38]

Givenness and topicality are also related to predictability.[12] Discourses tend to be thematically organized, so information that has already been mentioned is likely to be mentioned again. This makes the words themselves predictable, but it also makes references to the information itself predictable, regardless of what words are used.[48] Prince[4] even offers one definition of givenness based on the speaker’s assumption of what is predictable to the listener. However, recent work has yielded mixed results about whether the speakers’ choices of referring expressions, e.g. pronouns vs. names, are related to referential predictability. While some studies find that they do (12, 49), others find that they do not,[50],[51] see also [52].

Information status reflects the social/communicative function of language

The common ground approach

There is broad consensus that information status is relevant to language use because it serves a social function. As Grice[53] famously argued, speakers design their utterances to meet the demands of successful communication. For example, he suggests that speakers follow the maxim of Quantity, which is to say as much as required but not more. The information already established in context helps determine how much is required. This generalization fits well with the cross-linguistic tendency for speakers to produce more specific referential expressions when information is not predictable from the context. On this view, information packaging results from a language convention: speakers may order given information before new as a result of a social “contract”[8] with listeners.

The social/communicative view of information packaging critically depends on the idea that speakers and listeners keep track of what information is mutually known, or in common ground.[54] Common ground can include a social or cultural background, a linguistic or environmental domain, and expectations about the course of the conversation. For example, common ground helps conversational partners understand that when a speaker says The capybara is adorable, he means the capybara known in the conversation and not some other. Similarly, Prince[55] distinguishes discourse-givenness from hearer-givenness.

Speakers draw on common ground for the process of audience design, whereby they choose linguistic forms that fit the knowledge and attentional state of their interlocutors. For example, conversation partners tend to establish shared terms of reference, and switching partners can lead to decreased efficiency.[56] Speakers include necessary modifiers (e.g. ‘small square’ in the context of squares of different sizes), assuming they have noticed the referential contrast.[57] Listeners also keep track of what information is shared with the speaker, and are faster to understand references like the red triangle when there is a single red triangle in common ground,[58] and watch the speaker’s visual focus of attention to help interpret ambiguous definite descriptions like the circle in the presence of multiple circles.[59]

On the other hand, the effects of common ground on language production and comprehension are somewhat limited. Some researchers have suggested that, at least in some situations, speakers default to their own perspective when planning or executing utterances.[60] Listeners, too, sometimes consider objects in privileged ground as potential referents, even though the speaker is unaware of the object’s presence.[61]

Thus, representations of information status must include the speaker and listener’s knowledge about what information is in common ground, and what is not. However, given the mixed findings in this area, common ground does not offer a simple mechanism for defining information structure. Speakers track the knowledge and intentional state of their interlocutor, but it appears to influence only some linguistic processes, and not others.

The optimal system approach

Another approach to understanding how information affects speech relies on the argument that in a noisy communication channel like natural speech, the most efficient way to maximize information transfer is to maintain uniform information transmission over time.[62] This means that when the context makes information expected or predictable, speakers should tend to reduce the linguistic forms that encode this information. By contrast, they should encode less predictable information with more explicit pronunciations, lexical, and/or syntactic choices. Recent information density proposals hold that speech reflects this rational organization,[63] and that comprehenders use an analogous information-based method, although neither speakers nor comprehenders necessarily make these calculations consciously. On this view, information status has a mathematical definition. A word’s information is defined as the negative logarithm-transformed frequency of that word out of context (I(word)=log1p(word)=−log(p(word))) $(I (word) = log \frac{1}{p (word)} = - log (p (word)))$ . This measures how much uncertainty reduction (i.e. information) a word carries.

This approach accounts for a wide variety of empirical phenomena. For example, speakers lengthen words that are relatively unpredictable in context (i.e. that carry more information), and reduce the overall clarity of predictable words.[64] At the word level, optional that (e.g., She knew that her horse won vs. She knew her horse won) occurs more often when information density is high, i.e. when the word is less predictable in context.[65] More abstractly, speakers increase the informational contribution of sentences as a discourse unfolds.[66] In comprehension, too, words that create less surprisal (the inverse of uncertainty reduction) cause less processing difficulty, suggesting that readers and listeners track information density.[67,68]

Go to:

FUTURE DIRECTIONS: REPRESENTING INFORMATION STATUS PSYCHOLOGICALLY

There is broad agreement amongst researchers that language form varies as a function of information status and information structure. But the theoretical views outlined above demonstrate that there is still substantial variation in the dimension of information that is deemed relevant. There has also been little systematic research on the relation between these different approaches. We argue that a promising next step for the field is to examine the psychological mechanisms by which information status is represented and used. This is the focus of ongoing research. Here we identify some of the assumptions that stem from the theoretical approaches outlined above, review existing proposals about processing mechanisms, and identify questions that require further attention.

The prototypical treatment of information status carves up utterances into categories like given/new or topic/focus. These categories are used to explain why speakers use one linguistic form over another, and how comprehenders interpret them. This view seems to assume that the processing system includes an explicit representation of information status that is available to the linguistic processes governing word order, intonation, reference form choices, and other information-structure phenomena, as well as a mechanism for selecting the right form. Yet there are many open questions about both 1) the representation of information status and 2) the mechanisms by which information status affects language form.

How is information status represented?

Information status as a nonlinguistic representation

One possibility is that information status is represented nonlinguistically in the minds of the discourse participants. Many models assume that discourse participants maintain a mental model of the current situation, which also includes information-status tags[69], or gradient representations of discourse salience.[70]

Another possibility is that our mental representations do not specifically assign information status tags, but rather that information status emerges naturally out of human memory and attentional systems (see also [71] on the relation between linguistic and non-linguistic representations). Theories of memory distinguish between working memory, which stores information currently under use, and longer-term memory, which stores conceptual and procedural knowledge for later use. We might define “given” information as information in working memory, while “new” is information that has not yet been retrieved from long-term memory (see [5] for discussion).

Likewise, gradient distinctions in salience or accessibility may correspond to the amount of attention paid. This intuition underlies Gundel et al.’s Givenness hierarchy,[11] which suggests that linguistic forms are chosen based on the speaker’s assumption about the listener’s attentional state. The relation to attention is similarly embodied in the linguistic term “focus of attention” (related to Centering Theory’s Backward-looking Center),[72]which has been used to explain referential form, word order, and other phenomena.

This emergent view of information representation is consistent with Horton and Gerrig’s[73] proposal that common ground representations are a natural consequence of the way memory works. They propose that interlocutors automatically and implicitly store associative information about the conversation (for example, who introduced a particular referent), in the same way they store information outside of linguistic contexts. On this account, speakers then draw on these associations to make their conversational contributions appropriate. This allows them to rely on their own knowledge and ease of processing, even in non-communicative tasks.

More research is needed to understand how representations of information change dynamically over time, and based on the context. The optimal systems approach (described above) defines information in terms of the probability of word, referent, or structure. This probability, in principle, must be calculated within a particular context. This account leads to important questions about how detailed the calculation of probability is, what information is included and at what level of granularity, and how quickly it is calculated and updated.

Information status as a linguistic representation

Another question is whether information status is represented explicitly in the syntactic structure (see [2] for discussion). This question is posed by the theoretical approach to syntax known as the Cartographic approach[74, 75]. One of the basic aims of this work is to investigate what is encoded in the grammar: “Of the properties which enter into human thought and belief systems, which ones are represented as grammatical features?” ([74], p. 424). If sentences have topics and foci, are these properties also represented as grammatical features in the syntactic representation of sentences? According to Rizzi[76] and others, the answer is yes. On this view, particular linguistic elements are ‘flagged’ with different information structural features, and need to find their way to the correct syntactic position in order to ‘satisfy’ the information-structural feature that they carry. A large body of fine-grained, crosslinguistic work has been conducted within the Cartographic approach, allowing researchers to identify crosslinguistic generalizations as well as differences.

Mechanisms for using information status

Information as selectional criteria

Information structure is typically used as an explanation for why one linguistic form is preferred over another. As such, information structure represents the conditioning context, and linguistic forms are chosen on the basis of grammatical or pragmatic rules that specify this relation. This is the view represented by the Cartographic approach, above, as well as other pragmatic and grammatical approaches to information structure. More work is needed to develop psycholinguistic models of these processes.

Information effects via processing facilitation

Another possibility is that information structure is relevant because it has consequences for the psycholinguistic processes underlying language production and comprehension. Information that is given, attended, predictable and topical is typically easier to process than information that is new and unattended. For example, when information has a strong memory representation and/or is attended to, it should be easy to retrieve. It has been proposed that this facilitates the reactivation of this information during both language production and comprehension, and that affects linguistic behaviour directly.

One such proposal is that syntactic choices reflect the ease of producing different elements in the utterance.[77] Speakers tend to choose syntactic constructions that allow them to put accessible information early in an utterance, possibly because it allows them to postpone the difficult part of the utterance, which needs further planning time.[47].This view is supported by evidence that speakers choose word orders as a function of their own visual attention.[78] A related proposal is that speakers produce acoustically reduced forms for repeated words because the production of given information is facilitated.[79] Related to this, comprehenders’ recall and ease of processing linguistic stimuli is influenced by focus-marking devices such as clefts[80] and pitch accents.[81]

Information distribution

Information-theoretic accounts of linguistic variation, described in the section on optimal systems, suggest that linguistic forms are chosen so as to create a smooth distribution of information. A transparent application of this literature would suggest that the language production system has a monitoring system that identifies the amount of information in a particular idea, and selects linguistic forms that distribute it evenly. Such a mechanism might draw directly on the representations proposed by other research traditions to quantify information (e.g., topic/focus, given/new, addressee knowledge, etc). Alternatively, these effects may be the result of numerous separate mechanisms such as the ones described above.

Conclusion

The relationship between information status and linguistic form is complex. There are many linguistic phenomena that are sensitive to informational considerations, yet there is no single theory of information status representations that captures them all. One reason for this is that different linguistic phenomena seem to pick out different categories of information. As we described above, different languages encode information status with reference, word order, morphological marking, prosody, and other linguistic phenomena. Some of these seem to require categorical divisions between topic and focus, or given and new. Yet others seem to require gradient representations of topicality or predictability.

Part of the heterogeneity in how languages encode information status may stem from the breadth of what we might call “information”. We can think of information as a property of things in the world, such that some information is either inherently more salient, or more salient than other information in the context. Alternatively, we could think of information as a property of how entities and events in the world are represented psychologically, such that some information has stronger, more detailed, or more accessible representations. Other approaches view information structure as an emergent part of social interaction, for example the notion that “given” and “new” are defined with respect to common ground and/or communication over a noisy signal. A goal of future work is to identify how these characterizations relate to each other and to variation in linguistic form.

Go to:

Acknowledgments

J. Arnold was partially funded by National Science Foundation Grant BCS-0745627. E. Kaiser was partially funded by National Institutes of Health Grant R01 HD061457.

Go to:

Contributor Information

Jennifer E. Arnold, University of North Carolina at Chapel Hill.

Elsi Kaiser, University of Southern California.

Jason M. Kahn, University of North Carolina at Chapel Hill.

Lucy Kyoungsook Kim, University of Southern California.

Go to:

References

1. Halliday MAK. Notes on Transitivity and Theme in English, Part 2. Journal of Linguistics. 1967;3:199–244.

2. Lambrecht K. Information structure and sentence form. Cambridge, UK: Cambridge University Press; 1994.

3. Chafe WL. Language and consciousness. Language. 1974;50:111–133.

4. Prince EF. Toward a taxonomy of given/new information. In: Cole P, editor. Radical Pragmatics. New York, NY: Academic Press; 1981. pp. 223–256.

5. Chafe WL. Discourse, consciousness, and time. Chicago: Chicago University Press; 1994.

6. Halliday MAK, Hasan R. Cohesion in English. London: Longman; 1976.

7. Hawkins J. On (in)definite articles: implicatures and (un)grammaticality prediction. Journal of Linguistics. 1991;27:405–442.

8. Clark HH, Haviland SE. Comprehension and the given-new contract. In: Freedle RO, editor. Discourse production and comprehension. Hillsdale, NJ: Erlbaum; 1977. pp. 1–40.

9. Gordon P, Grosz B, Gilliom L. Pronouns, names, and the centering of attention in discourse. Cognitive Science. 1993;17:311–347.

10. Ariel M. Accessing noun-phrase antecedents. London: Routledge; 1990.

11. Gundel J, Hedberg N, Zacharski R. Cognitive status and the form of referring expressions in discourse. Language. 1993;69:274–307.

12. Arnold JE. PhD Dissertation. Stanford University; 1998. Reference form and discourse patterns.

13. Kaiser E. Focusing on pronouns: Consequences of subjecthood, pronominalisation, and contrastive focus. Language and Cognitive Processes. 2011;26:1625–1666.

14. Chambers C, Smyth R. Structural parallelism and attentional focus: A test of centering theory. Journal of Memory and Language. 1998;39:593–608.

15. Garvey C, Caramazza A. Implicit causality in verbs. Linguistic Inquiry. 1974;5:459–464.

16. Kaiser E, Trueswell JC. Interpreting pronouns and demonstratives in Finnish: Evidence for a form-specific approach to reference resolution. Language and Cognitive Processes. 2008;23(5):709–748.

17. Järvikivi J, van Gompel RPG, Hyönä J, Bertram R. Ambiguous pronoun resolution: Contrasting the first-mention and subject-preference accounts. Psychological Science. 2005;4:260–264. [PubMed]

18. Kuno S. The structure of the Japanese language. MIT Press; Cambridge: 1973.

19. Lee C. Contrastive topic: A locus of the interface: Evidence from Korean and English. In: Turner K, et al., editors. The Semantics/Pragmatics Interface from Different Points of View. 1999.

20. Kim LK, Kaiser E. Asymmetrical effects of topic-marking on discourse processing and memory retention. MIT Working Papers in Linguistics (MITWPL): Proceedings of the Eighth Workshop on Altaic Formal Linguistics (WAFL 8). (to appear)

21. Han C. Asymmetry in the interpretation of -(n)un in Korean. Japanese/Korean Linguistics. 1998;7:1–15.

22. Aissen Judith L. Topic and focus in Mayan. Language. 1992;68(1):43–80.

23. Fiedler I, Hartmann K, Reineke B, Schwarz A, Zimmermann M. Subject focus in West African languages. In: Zimmermann M, Féry C, editors. Information Structure. Theoretical, Typological, and Experimental Perspectives. Oxford University Press; Oxford: 2009. pp. 234–257.

24. Partee BH, Borschev V. The semantics of Russian Genitive of Negation: The nature and role of Perspectival Structure. In: Young Robert B., editor. Proceedings from SALT XIV. Ithaca: CLC Publications; 2004. pp. 212–234.

25. Kaiser E. Case Alternations and NPIs in Questions in Finnish. WCCFL 21: Proceedings of the 21st West Coast Conference on Formal Linguistics,; Somerville, Mass: Cascadilla Press; 2002. pp. 194–207.

26. Grimes Joseph E. Topic Inflection in Mapudungun Verbs. IJAL. 1985;51:141–63.

27. Arnold JE. The inverse system in Mapudungun and other languages. Revista de Lingüística Teórica y Aplicada. 1997;34:9–48.

28. König E. The Meaning of Focus Particles: A Comparative Perspective. Routledge; 2001.

29. Firbas J. Non-Thematic Subjects in Contemporary English. Travaux linguistiques de Prague. 1966;2:239–256.

30. Wasow T. Postverbal Behavior. CSLI Publications; 2002.

31. Birner BJ, Ward G. Information Status and Noncanonical Word Order in English. Amsterdam/Philadelphia: John Benjamins; 1998.

32. Karttunen L, Kay M. Parsing in a Free Word Order Language. In: Dowty D, Karttunen L, Zwicky A, editors. Natural Language Parsing: Psychological, computational, and theoretical perspectives. New York: Cambridge University Press; 1985. pp. 279–306.

33. Pierrehumbert JB, Hirschberg J. The meaning of intonational contours in the interpretation of discourse. In: Cohen PR, et al., editors. Intentions in Communication. Cambridge, MA: MIT Press; 1990. pp. 271–311.

34. Selkirk E. Sentence prosody: Intonation, stress, and phrasing. In: Goldsmith J, editor. The Handbook of Phonological Theory. Vol. 1. Hoboken, NJ: Blackwell; 1995. pp. 550–569.

35. Rooth M. A theory of focus interpretation. Natural Language Semantics. 1992;1:75– 116.

36. Schwarzschild R. GIVENness, AvoidF and other constraints on the placement of accent. Natural Language Semantics. 1999;7:141–177.

37. Fowler CA, Housum J. Talkers’ signaling of “new” and “old” words in speech and listeners’ perception and use of the distinction. Journal of Memory and Language. 1987;26:489–504.

38. Dahan D, Tanenhaus MK, Chambers CG. Accent and reference resolution in spoken-language comprehension. Journal of Memory and Language. 2002;47:292–314.

39. Gundel JK. PhD dissertation. University of Texas; Austin: 1974. The role of topic and comment in linguistic theory.

40. Reinhart T. Pragmatics and linguistics: an analysis of sentence topics. University of Indiana Linguistics Club. 1982 (also Philosophica 1981, 27, 53ߝ94)

41. Sgall P, Hajicova WE. Focus on focus. The Prague Bulletin of Mathematical Linguistics. 1977;28:5–54.

42. Chomsky N. Deep structure, surface structure, and semantic interpretation. In: Steinberg D, Jacobovits L, editors. Semantics. Cambridge: Cambridge University Press; 1971. pp. 183–216.

43. Jackendoff R. Semantic interpretation in generative grammar. Cambridge, MA: MIT Press; 1972.

44. Ward G. PhD dissertation. University of Pennsylvania; 1985. The semantics and pragmatics of preposing.

45. Vallduví E. PhD dissertation. University of Pennsylvania; 1990. The information component.

46. Givón T. Topic continuity in discourse: A quantitative cross language study. Amsterdam: John Benjamins Publishing Company; 1983.

47. Arnold JE, Wasow T, Losongco T, Ginstrom R. Heaviness vs. newness: The effects of structural complexity and discourse status on constituent ordering. Language. 2000;76:28–55.

48. Arnold JE. How speakers refer: The role of accessibility. Language and Linguistics Compass. 2010;4(4):187–203. doi: 10.1111/j.1749-818X.2010.00193.x. [Cross Ref]

49. Tily H, Piantadosi S. Refer efficiently: Use less informative expressions for more predictable meanings. Proceedings of the workshop on the production of referring expressions: Bridging the gap between computational and empirical approaches to reference; 2009.2009.

50. Fukumura K, Van Gompel RPG. Choosing anaphoric expressions: Do people take into account likelihood of reference? Journal of Memory and Language. 2010;62:52–66.

51. Kaiser E. Investigating the Consequences of Focus on the Production and Comprehension of Referring Expressions. International Review of Pragmatics. 2010;2(2):266–297.

52. Kehler A, Kertz L, Rohde H, Elman JL. Coherence and coreference revisited. Journal of Semantics. 2008;25:1–44. [PMC free article] [PubMed]

53. Grice HP. Logic and conversation. Syntax and semantics III. In: Cole P, Morgan J, editors. Speech acts. New York: Academic Press; 1975. pp. 41–58.

54. Clark HH, Marshall CR. Definite reference and mutual knowledge. In: Joshi AK, Webber BL, Sag IS, editors. Elements of Discourse Understanding. New York: Cambridge University Press; 1981. pp. 10–63.

55. Prince E. The ZPG letter: subjects, definiteness, and information-status. In: Thompson S, Mann W, editors. Discourse description: diverse analyses of a fund raising text. Philadelphia/Amsterdam: John Benjamins B.V; 1992. pp. 295–325.

56. Brennan SE, Clark HH. Conceptual pacts and lexical choice in conversation. Journal of Experimental Psychology: Learning, Memory and Cognition. 1996;22(6):482–1493. doi: 10.1.1.121.3930. [PubMed]

57. Brown-Schmidt S, Tanenhaus MK. Watching the eyes when talking about size: An investigation of message formulation and utterance planning. Journal of Memory and Language. 2006;54:592–609.

58. Hanna JE, Tanenhaus MK, Trueswell JC. The effects of common ground and perspective on domains of referential interpretation. Journal of Memory and Language. 2003;49:43–61.

59. Hanna JE, Brennan SE. Speakers’ eye gaze disambiguates referring expressions early during face-to-face conversation. Journal of Memory and Language. 2007;57:596–615.

60. Ferreira VS, Slevc LR, Rogers ES. How do speakers avoid ambiguous linguistic expressions? Cognition. 2005;96:263–284. [PubMed]

61. Keysar B, Barr DJ, Balin JA, Brauner JS. Taking perspective in conversation: The role of mutual knowledge in comprehension. Psychological Science. 2000;11:32–38. [PubMed]

62. Shannon CE. A mathematical theory of communications. Bell Labs Technical Journal. 1948;27(4):623–656.

63. Aylett M, Turk A. The Smooth Signal Redundancy Hypothesis: A Functional Explanation for Relationships between Redundancy, Prosodic Prominence, and Duration in Spontaneous Speech. Language and Speech. 2004;47 (1):31–56. [PubMed]

64. Bell A, Brenier JM, Gregory M, Girand C, Jurafsky D. Predictability effects on durations of content and function words in conversational English. Journal of Memory and Language. 2009;60:92–111.

65. Levy R, Jaeger TF. Speakers optimize information density through syntactic reduction. In: Schlokopf B, Platt J, Homan T, editors. Advances in neural information processing systems (NIPS) Vol. 19. Cambridge, MA: MIT Press; 2007. pp. 849–856.

66. Qian T, Jaeger TF. Cue Effectiveness in Communicatively Efficient Discourse Production. Cognitive Science 2012 [PubMed]

67. Levy R. Expectation-based syntactic comprehension. Cognition. 2008;106(3):1126–1177. [PubMed]

68. Jaeger T Florian, Tily Harry. Wiley Interdisciplinary Reviews: Cognitive Science. 2010. On language ‘utility’: Processing complexity and communicative efficiency; pp. 323–35. [PubMed]

69. Schmitt BM, Meyer AS, Levelt WJ. Lexical access in the production of pronouns. Cognition. 1999;69(3):313–35. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/10193050. [PubMed]

70. Arnold JE, Griffin Z. The effect of additional characters on choice of referring dexpression: Everyone competes. Journal of Memory and Language. 2007;56:521–536. [PMC free article] [PubMed]

71. Kaiser E. Taking action: a cross-modal investigation of discourse-level representations. Frontiers in Psychology. 2012;3:156. doi: 10.3389/fpsyg.2012.00156. [PMC free article] [PubMed] [Cross Ref]

72. Grosz B, Joshi A, Weinstein S. Centering: A Framework for Modelling the Local Coherence of Discourse. Computational Linguistics. 1995;2(21):203–225.

73. Horton WS, Gerrig RJ. Conversational common ground and memory processes in language production. Discourse Processes. 2005;40:1–35.

74. Cinque G, Rizzi L. The Cartography of Syntactic Structures. In: Heine B, Narrog H, editors. The Oxford Handbook of Linguistic Analysis. Oxford University Press; Oxford, New York: 2010. pp. 51–65.

75. Shlonsky U. The cartographic enterprise in syntax. Language and Linguistics Compass. 2010;4(6):417–429.

76. Rizzi L. The fine structure of the left periphery. In: Haegeman L, editor. Elements of grammar: a handbook of generative syntax. Dordrecht: Kluwer; 1997. pp. 281–337.

77. Ferreira VS. The persistence of optional complementizer production:Why saying “that” is not saying “that” at all. Journal of Memory and Language. 2003;48:379–398.

78. Gleitman LR, January D, Nappa R, Trueswell JC. On the give and take between event apprehension and utterance formulation. Journal of Memory and Language. 2007;57:544–569. [PMC free article] [PubMed]

79. Kahn J, Arnold JE. A Processing-Centered Look at the Contribution of Givenness to Durational Reduction. Journal of Memory and Language (in press)

80. Singer M. Thematic structure and the integration of linguistic information. Journal of Verbal Learning and Verbal Behavior. 1976;15(5):549–558.

81. Terken J, Noteboom S. Opposite effects of accentuation and deaccentuation on verification latencies for Given and New information. Language and Cognitive Processes. 1987;2:145–163.

Save items

Cited by other articles in PMC

The influence of information status on pronoun resolution in Mandarin Chinese: evidence from ERPs[Frontiers in Psychology. 2015]
Information status relates to production, distribution, and comprehension[Frontiers in Psychology. 2013]

See all...

Recent Activity

Clear Turn Off Turn On

Information Structure: Linguistic, Cognitive, and Processing Approaches
Information Structure: Linguistic, Cognitive, and Processing Approaches
NIHPA Author Manuscripts. July/August 2013; 4(4)403
Kinematic Analysis Quantifies Gait Abnormalities Associated with Lameness in Bro...
Kinematic Analysis Quantifies Gait Abnormalities Associated with Lameness in Broiler Chickens and Identifies Evolutionary Gait Differences
PLoS ONE. 2012; 7(7)
The evaporative requirement for heat balance determines whole-body sweat rate du...
The evaporative requirement for heat balance determines whole-body sweat rate during exercise under conditions permitting full evaporation
The Journal of Physiology. 2013 Jun 1; 591(Pt 11)2925
The smallest known eukaryotic genomes encode a protein gene: towards an understa...
The smallest known eukaryotic genomes encode a protein gene: towards an understanding of nucleomorph functions.
Mol Gen Genet. 1994 Jun 3;243(5):600-4.

PubMed

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Coherence and Coreference Revisited.[J Semant. 2008]

Kehler A, Kertz L, Rohde H, Elman JL

J Semant. 2008; 25(1):1-44.
Ambiguous pronoun resolution: contrasting the first-mention and subject-preference accounts.[Psychol Sci. 2005]

Järvikivi J, van Gompel RP, Hyönä J, Bertram R

Psychol Sci. 2005 Apr; 16(4):260-4.

Coherence and Coreference Revisited.[J Semant. 2008]

Kehler A, Kertz L, Rohde H, Elman JL

J Semant. 2008; 25(1):1-44.

Conceptual pacts and lexical choice in conversation.[J Exp Psychol Learn Mem Cogn. 1996]

Brennan SE, Clark HH

J Exp Psychol Learn Mem Cogn. 1996 Nov; 22(6):1482-93.

How do speakers avoid ambiguous linguistic expressions?[Cognition. 2005]

Ferreira VS, Slevc LR, Rogers ES

Cognition. 2005 Jul; 96(3):263-84.
Taking perspective in conversation: the role of mutual knowledge in comprehension.[Psychol Sci. 2000]

Keysar B, Barr DJ, Balin JA, Brauner JS

Psychol Sci. 2000 Jan; 11(1):32-8.

The smooth signal redundancy hypothesis: a functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech.[Lang Speech. 2004]

Aylett M, Turk A

Lang Speech. 2004; 47(Pt 1):31-56.

Cue effectiveness in communicatively efficient discourse production.[Cogn Sci. 2012]

Qian T, Jaeger TF

Cogn Sci. 2012 Sep-Oct; 36(7):1312-36.
Expectation-based syntactic comprehension.[Cognition. 2008]

Levy R

Cognition. 2008 Mar; 106(3):1126-77.
On language 'utility': processing complexity and communicative efficiency.[Wiley Interdiscip Rev Cogn Sci. 2011]

Jaeger TF, Tily H

Wiley Interdiscip Rev Cogn Sci. 2011 May; 2(3):323-35.

Lexical access in the production of pronouns.[Cognition. 1999]

Schmitt BM, Meyer AS, Levelt WJ

Cognition. 1999 Jan 1; 69(3):313-35.
The effect of additional characters on choice of referring expression: Everyone counts.[J Mem Lang. 2007]

Arnold J, Griffin ZM

J Mem Lang. 2007 May; 56(4):521-536.

Taking action: a cross-modal investigation of discourse-level representations.[Front Psychol. 2012]

Kaiser E

Front Psychol. 2012; 3():156.

On the give and take between event apprehension and utterance formulation.[J Mem Lang. 2007]

Gleitman LR, January D, Nappa R, Trueswell JC

J Mem Lang. 2007 Nov; 57(4):544-569.

Support Center Support Center

PMC

Information Structure: Linguistic, Cognitive, and Processing Approaches

Abstract

WHAT IS INFORMATION STRUCTURE?