Discourse - Text segmentation
Text segmentation is
the process of dividing written
text into
meaningful units such as topics, sentences or words.
A sentence is the part of a speech or a written discourse that
has a complete and independent meaning. Sentence segmentation refers to
identifying sentences in an unstructured text. The process of sentence
segmentation is a basic step for discourse analysis. It is because, any text
stream needs to be separated into coherent sentences in order to enable
effective analysis, such as information retrieval, summarization, understanding
and translation. It is very important to first define what is meant by a
complete and independent sentence. Some researchers have defined sentence, as a
finite clause that has a complete and independent meaning [13]. The Cambridge
Encyclopedia of Languages defines a sentence as the largest unit to which
syntactic rules apply [8]
Text segmentation by topic assignments -
semantically coherent chunks of text are also similar in a topical sense.
Sentence segmentation is the problem of dividing a string of
written language into its component sentences.
Discourse usually refers to a form of written text or spoken
language used to communicate ideas or beliefs to be recognized by the
hearer/reader,
Discourse is not just a random sequence of sentences and
clauses; rather, it is a coherent, understandable text for the reader or the
hearer.
Each utterance of a discourse contributes to the communicative
import of preceding utterances, or constitutes the onset of a new unit of
meaning or action that subsequent utterances may add to.
Discourse relations are semantic relations such as causality, contrast and
temporality, that connect two textual units, typically clauses or sentences. The
textual units connected should express abstract objects such as events, actions,
facts or beliefs. They are also called arguments. There are two types of disccourse relations: (1) relations that are signaled explicitly via so called
discourse connectives (explicit relations), and (ii) relations that can be
inferred from the context without any explicit signaling (implicit
relations)
The structure of expository texts can be
characterized as a sequence of sub topical discussions that occur in the context
of a few main topic discussion.
The term repetition is a strong cohesion indicator. Repetition alone is a
very useful indicator of subtopic structure, when analyzed in terms of multiple
simultaneous information threads.
The more similar two blocks of text are, the more likely it is that the
current subtopic continues, and, conversely, if two adjacent blocks of text are
dissimilar, this implies a change in subtopic flow.
Readers of all ages must be aware of text structures if they are to be most successful (Meyer, 2003). The structure or organization of the text is the arrangement of ideas and the relationships among the ideas (Armbruster, 2004).
Readers who are unaware of the text structures are at a disadvantage because they do not approach reading with any type of reading plan
(Meyer, Brandt, & Bluth, 1980). However, readers who are familiar with text structures expect the information to unfold in certain ways
(RAND Reading Study Group, 2002).
Most expository texts are structured to facilitate the study process for
prospective readers. These texts contain structural elements that help guide
students through their reading. Authors of expository texts use these structures
to arrange and connect ideas.
Carrell (1985) argued that instruction on text structure indeed has a positive effect on the students' recall protocols. Meyer (1985) stated that knowledge of the rhetorical relationship of the ideas-main idea, major ideas, and supporting details-helps readers with their comprehension of the expository texts. Reading researchers have argued that knowledge of text organization or structure is an important factor for text comprehension (see Aebersold & Field, 1997; Fletcher, 2006; Grabe, 1991, 2004, 2008; Hall, Sabey, & McClellan, 2005; Horiba, 2000; Kendeou & van den Broek, 2007; Meyer, 2003; Meyer & Poon, 2001; Snyder, 2010).
Text features can help readers locate and organize information in the text. For example, headings help introduce students to specific bits of information. Presenting information in this manner helps students hold each bit of information in their short-term memory. Students then can process it or connect it to background knowledge and store it in their longterm memory. Without headings, information would be overwhelming, making it difficult to be processed effectively.
Structural elements in expository texts vary; therefore, it is important to introduce students to the components of various texts throughout the school year. It is also important to teach and model the use of these components properly at the beginning of the school year. The recognition and use of text organization are essential processes underlying comprehension and retention. As early as the third grade, students are expected to recognize expository text structures. Meyer (1985) classified these text structures as follows:
- Description-The author describes a topic.
- Sequence-The author uses numerical or chronological order to list items or events.
- Compare/contrast-The author compares and contrasts two or more similar events, topics, or objects.
- Cause/effect-The author delineates one or more causes and then describes the ensuing effects.
- Problem/solution-The author poses a problem or question and then gives the answer.
The ability to identify and analyze these text structures in expository texts helps readers to comprehend the text more easily and retain it longer. To achieve better results, it is highly recommended to introduce and work on text structures in the order prescribed in what follows.
Text segmentation is concerned with breaking down documents into smaller
semantically coherent chunks.
Topic segmentation can be divided into two sub-fields: (i) linear topic
segmentation and (ii) hierarchical topic segmentation. Whereas linear topic
segmentation deals with the sequential analysis of topical changes, hierarchical
segmentation concerns with finding more fine grained subtopic structures in
texts.
Arabic text of Grand Qur'aan is completely diacritized that leaves no chance
of ambiguity. It should be noted that non diacritized texts are highly
ambiguous: the proportion of ambiguous words exceeds 90%. For example the.word
Kataba can be diacritized in 21 different ways.
Properties of discourse
Discourse Cohesion
The concept of discourse structure is the answer to the question: What makes a
discourse cohesive/coherent? In the late 20th century, linguists began to express
cohesion through the lexicogrammatical system of the language (grammar and
vocabulary). There are five types of cohesion associated with grammatical and
lexical elements: (i) reference cohesion, when elements express referential
identities via anaphora [referring back: reference to a word or
phrase used earlier, especially to avoid repeating the word or phrase by
replacing it with something else such as a pronoun] such as pronoun. in Ex. 2-1(a). (ii) substitution
cohesion, a replacement of one element by another such as one to be replaced by
axe in Ex. 2-1 (b), (iii) ellipsis cohesion, a replacement of elements by
nothing. The text is still understandable from prior elements, such as in the
nominal ellipses in Ex, 2-1 (c), (iv) lexical cohesion, as the reiteration/repetition
of the same element via a synonym or hyponym, (v) conjunction cohesion where
propositions in discourse are systematically related to prior propositions using
lexical item (e.g. coordinating and subordinating conjunctions such as or and
but, adverbials such as besides, and prepositional phrases such as in contrast,
wee Ex. 2-1(d) The fifth type of cohesion is the source of discourse relation,
the concern of the presented study.
Explicit vs. implicit relations
Discourse relations are often signaled explicitly for more readability using
lexical elements called cue phrases, discourse markers, or discourse
connectives. Discourse connectives are conjunction (and, or, but), adverbs
(because, instead, since) and prepositional phrases (in contrast)
Discourse connectives have two distinct functions as distinguished by Cohen
(1984): (i) enabling faster recognition of discourse relations by the reader
(the hearer), and (ii) allowing the recognition of discourse relations which
could not be inferred in the absence of a connective.
There are three main syntactic categories of discourse
connective in English: (i) coordinating or subordinating conjunctions, (ii)
adverbials, (iii) prepositional phrases. However, not all conjunctions,
adverbials and prepositional phrases always function as discourse connectives as
they also need to relate abstract entities in discourse.
Coordinating conjunctions, Two clauses can be joined by a
coordinating conjunction such as and, or and but. Frequent functions of these
connectives are the discourse relations Conjunctions, Alternative and Contract,
respectively.
Subordinating conjunctions, These conjunctions introduce
clauses that are syntactically dependent on the main clause. Examples are
because, although, and if, which express discourse relations Causal, Contrast
and Condition respectively.
Adverbial connectives. Sentence-modifying adverbs can express
a discourse relation between two abstract entities. Examples are therefore and
then which express discourse relations such as Causal and Conditional relation
respectively.
Prepositional phrases. Such as in contrast and as a result can
also express discourse relations; Contrast and Consequence relations.
Discourse connectives can consist of two parts. These are
called paired connectives where each connective's part introduces an argument of
the connectives such as paired connectives if....then.
Discourse connective is defined as lexicap expressions that relate two text
segments expressing abstract objects such as events, beliefs, facts, or
propositions. The text segments are called arguments of a specific connective.
This connective should indicate one or more discourse relations such as
Elaboration, Exemplification, Contrast, Temporal, Exception, Causal or simply
Conjunction.
Types of sentences:
Four Types of Sentences and the Effect of
Punctuation
When students learn to write, they begin by learning
about the four types of sentences and the role
punctuation plays in determining and creating those
different sentence types.
The four types of sentences in the English
language include:
- Declarative sentence
- Imperative sentence
- Interrogative sentence
- Exclamatory sentence
And there are only three punctuation marks
with which to end a sentence:
- Period
- Question mark
- Exclamation point
Using different types of sentences and
punctuation, students can vary the tone of their writing
assignments and express a variety of thoughts and
emotions.
A declarative sentence simply makes
a statement or expresses an opinion. In other words, it
makes a declaration. This kind of sentence ends with a
period.
Examples of this
sentence type:
“I want to be a good
writer.” (makes a statement)
“My friend is a really
good writer.” (expresses an opinion)
An imperative sentence gives a
command or makes a request. It usually ends with a
period but can, under certain circumstances, end with
an exclamation point.
Examples of this
sentence type:
“Please sit down.”
“I need you to sit down
now!”
An interrogative sentence asks a
question. This type of sentence often begins with who,
what, where, when, why, how, or do, and it ends with a
question mark.
Examples of this
sentence type:
“When are you going to
turn in your writing assignment?”
“Do you know what the
weather will be tomorrow?”
An exclamatory sentence is a
sentence that expresses great emotion such as
excitement, surprise, happiness and anger, and ends with
an exclamation point.
Examples of this
sentence type:
“It is too dangerous to
climb that mountain!”
“I got an A on my book
report!”
Learning about the different types of sentences and
punctuation will help students become better writers by
enabling them to convey various types of information and
emotion in their writing.