Discourse - Text segmentation

Text segmentation is the process of dividing written text into meaningful units such as topics, sentences or words.

 

A sentence is the part of a speech or a written discourse that has a complete and independent meaning. Sentence segmentation refers to identifying sentences in an unstructured text. The process of sentence segmentation is a basic step for discourse analysis. It is because, any text stream needs to be separated into coherent sentences in order to enable effective analysis, such as information retrieval, summarization, understanding and translation. It is very important to first define what is meant by a complete and independent sentence. Some researchers have defined sentence, as a finite clause that has a complete and independent meaning [13]. The Cambridge Encyclopedia of Languages defines a sentence as the largest unit to which syntactic rules apply [8]

Text segmentation by topic assignments - semantically coherent chunks of text are also similar in a topical sense.

Sentence segmentation is the problem of dividing a string of written language into its component sentences.

Discourse usually refers to a form of written text or spoken language used to communicate ideas or beliefs to be recognized by the hearer/reader,

Discourse is not just a random sequence of sentences and clauses; rather, it is a coherent, understandable text for the reader or the hearer.

Each utterance of a discourse contributes to the communicative import of preceding utterances, or constitutes the onset of a new unit of meaning or action that subsequent utterances may add to.

Discourse relations are semantic relations such as causality, contrast and temporality, that connect two textual units, typically clauses or sentences. The textual units connected should express abstract objects such as events, actions, facts or beliefs. They are also called arguments. There are two types of disccourse relations: (1) relations that are signaled explicitly via so called discourse connectives (explicit relations), and (ii) relations that can be inferred from the context without any explicit signaling  (implicit relations)

The structure of expository texts can be characterized as a sequence of sub topical discussions that occur in the context of a few main topic discussion.

The term repetition is a strong cohesion indicator. Repetition alone is a very useful indicator of subtopic structure, when analyzed in terms of multiple simultaneous information threads.

The more similar two blocks of text are, the more likely it is that the current subtopic continues, and, conversely, if two adjacent blocks of text are dissimilar, this implies a change in subtopic flow.

Readers of all ages must be aware of text structures if they are to be most successful (Meyer, 2003). The structure or organization of the text is the arrangement of ideas and the relationships among the ideas (Armbruster, 2004). Readers who are unaware of the text structures are at a disadvantage because they do not approach reading with any type of reading plan (Meyer, Brandt, & Bluth, 1980). However, readers who are familiar with text structures expect the information to unfold in certain ways (RAND Reading Study Group, 2002).

Most expository texts are structured to facilitate the study process for prospective readers. These texts contain structural elements that help guide students through their reading. Authors of expository texts use these structures to arrange and connect ideas.

Carrell (1985) argued that instruction on text structure indeed has a positive effect on the students' recall protocols. Meyer (1985) stated that knowledge of the rhetorical relationship of the ideas-main idea, major ideas, and supporting details-helps readers with their comprehension of the expository texts. Reading researchers have argued that knowledge of text organization or structure is an important factor for text comprehension (see Aebersold & Field, 1997; Fletcher, 2006; Grabe, 1991, 2004, 2008; Hall, Sabey, & McClellan, 2005; Horiba, 2000; Kendeou & van den Broek, 2007; Meyer, 2003; Meyer & Poon, 2001; Snyder, 2010).

Text features can help readers locate and organize information in the text. For example, headings help introduce students to specific bits of information. Presenting information in this manner helps students hold each bit of information in their short-term memory. Students then can process it or connect it to background knowledge and store it in their longterm memory. Without headings, information would be overwhelming, making it difficult to be processed effectively.

Structural elements in expository texts vary; therefore, it is important to introduce students to the components of various texts throughout the school year. It is also important to teach and model the use of these components properly at the beginning of the school year. The recognition and use of text organization are essential processes underlying comprehension and retention. As early as the third grade, students are expected to recognize expository text structures. Meyer (1985) classified these text structures as follows:

The ability to identify and analyze these text structures in expository texts helps readers to comprehend the text more easily and retain it longer. To achieve better results, it is highly recommended to introduce and work on text structures in the order prescribed in what follows.

Text segmentation is concerned with breaking down documents into smaller semantically coherent chunks.

Topic segmentation can be divided into two sub-fields: (i) linear topic segmentation and (ii) hierarchical topic segmentation. Whereas linear topic segmentation deals with the sequential analysis of topical changes, hierarchical segmentation concerns with finding more fine grained subtopic structures in texts.

Arabic text of Grand Qur'aan is completely diacritized that leaves no chance of ambiguity. It should be noted that non diacritized texts are highly ambiguous: the proportion of ambiguous words exceeds 90%. For example the.word Kataba can be diacritized in 21 different ways.

 

Properties of discourse

Discourse Cohesion
The concept of discourse structure is the answer to the question: What makes a discourse cohesive/coherent? In the late 20th century, linguists began to express cohesion through the lexicogrammatical system of the language (grammar and vocabulary). There are five types of cohesion associated with grammatical and lexical elements: (i) reference cohesion, when elements express referential identities via anaphora [referring back: reference to a word or phrase used earlier, especially to avoid repeating the word or phrase by replacing it with something else such as a pronoun] such as pronoun. in Ex. 2-1(a). (ii) substitution cohesion, a replacement of one element by another such as one to be replaced by axe in Ex. 2-1 (b), (iii) ellipsis cohesion, a replacement of elements by nothing. The text is still understandable from prior elements, such as in the nominal ellipses in Ex, 2-1 (c), (iv) lexical cohesion, as the reiteration/repetition of the same element via a synonym or hyponym, (v) conjunction cohesion where propositions in discourse are systematically related to prior propositions using lexical item (e.g. coordinating and subordinating conjunctions such as or and but, adverbials such as besides, and prepositional phrases such as in contrast, wee Ex. 2-1(d) The fifth type of cohesion is the source of discourse relation, the concern of the presented study.

Explicit vs. implicit relations
Discourse relations are often signaled explicitly for more readability using lexical elements called cue phrases, discourse markers, or discourse connectives. Discourse connectives are conjunction (and, or, but), adverbs (because, instead, since) and prepositional phrases (in contrast)

Discourse connectives have two distinct functions as distinguished by Cohen (1984): (i) enabling faster recognition of discourse relations by the reader (the hearer), and (ii) allowing the recognition of discourse relations which could not be inferred in the absence of a connective.

There are three main syntactic categories of discourse connective in English: (i) coordinating or subordinating conjunctions, (ii) adverbials, (iii) prepositional phrases. However, not all conjunctions, adverbials and prepositional phrases always function as discourse connectives as they also need to relate abstract entities in discourse.

Coordinating conjunctions, Two clauses can be joined by a coordinating conjunction such as and, or and but. Frequent functions of these connectives are the discourse relations Conjunctions, Alternative and Contract, respectively.

Subordinating conjunctions, These conjunctions introduce clauses that are syntactically dependent on the main clause. Examples are because, although, and if, which express discourse relations Causal, Contrast and Condition respectively.

Adverbial connectives. Sentence-modifying adverbs can express a discourse relation between two abstract entities. Examples are therefore and then which express discourse relations such as Causal and Conditional relation respectively.

Prepositional phrases. Such as in contrast and as a result can also express discourse relations; Contrast and Consequence relations.

Discourse connectives can consist of two parts. These are called paired connectives where each connective's part introduces an argument of the connectives such as paired connectives if....then.

Discourse connective is defined as lexicap expressions that relate two text segments expressing abstract objects such as events, beliefs, facts, or propositions. The text segments are called arguments of a specific connective. This connective should indicate one or more discourse relations such as Elaboration, Exemplification, Contrast, Temporal, Exception, Causal or simply Conjunction.

Types of sentences:
 

Four Types of Sentences and the Effect of Punctuation

When students learn to write, they begin by learning about the four types of sentences and the role punctuation plays in determining and creating those different sentence types.

The four types of sentences in the English language include:

  • Declarative sentence
  • Imperative sentence
  • Interrogative sentence
  • Exclamatory sentence

And there are only three punctuation marks with which to end a sentence:

  • Period
  • Question mark
  • Exclamation point

Using different types of sentences and punctuation, students can vary the tone of their writing assignments and express a variety of thoughts and emotions.

A declarative sentence simply makes a statement or expresses an opinion. In other words, it makes a declaration. This kind of sentence ends with a period.

Examples of this sentence type:

“I want to be a good writer.”  (makes a statement)

“My friend is a really good writer.” (expresses an opinion)

An imperative sentence gives a command or makes a request. It usually ends with a period but can, under certain circumstances, end with  an exclamation point.

Examples of this sentence type:

“Please sit down.”

“I need you to sit down now!”

An interrogative sentence asks a question. This type of sentence often begins with who, what, where, when, why, how, or do, and it ends with a question mark.

Examples of this sentence type:

“When are you going to turn in your writing assignment?”

“Do you know what the weather will be tomorrow?”

An exclamatory sentence is a sentence that expresses great emotion such as excitement, surprise, happiness and anger, and ends with an exclamation point.

Examples of this sentence type:

“It is too dangerous to climb that mountain!”

“I got an A on my book report!”

Learning about the different types of sentences and punctuation will help students become better writers by enabling them to convey various types of information and emotion in their writing.