MAKING THE “UNREADABLE” READABLE : THE LONGEST FRENCH NOVEL ON LINE
Madeleine
de Scudéry’s novel, Artamène ou le Grand
Cyrus, written between 1649 and 1653 is, by its length alone –
over 8000 pages -- exceptional and perhaps unequaled in the history
of the printed book.
The
novel's complex structure and plot make it difficult to obtain an
overall grasp of the work: there are more than four hundred
characters and over one hundred settings. Artamène
was conceived for public, non-linear readings. For all of these
reasons, as well as de Scudéry’s encyclopedic ambition,
Le Grand Cyrus offers a remarkable challenge for the
creation of a new kind of website, a site that proposes the
unabridged original text of this novel, whose importance is capital
in the history of seventeenth century French literature. The goal of
such a website is not only to ensure the survival of a work whose
sheer size would make modern paper publication unwieldy, but also to
create a reading tool appropriate for a textual mass of such an
imposing scale.
If our
customary model for literary reading – linear and intimate --
is inadequate for a work like de Scudéry's, the Internet seems
to be able to provide an alternative mode of reading, somewhat
analogous to that for which the novel was originally intended. In
other words, the transposition from the printed medium to an
electronic form should provide a reading experience closer to the
seventeenth century model -- a public, oral reading -- and thus give
us access to a text that, through the evolution of reading practices
over the centuries, had become unreadable.
This is the
goal of the «Artamène » project directed by
Claude Bourqui and Alexandre Gefen (Université de Neuchâtel)
and financed by the Fonds National Suisse de la recherche
scientifique (National Swiss Fund for Scientific Research). Today we
would like to discuss how, in the project’s initial phase, the
TEI guidelines – beyond their well-earned reputation for
reliability and stability -- allowed the optimal management of such
an immense text, and, through the semantics that they permit, allowed
the text to be processed in ways that would have been unimaginable
with a tradition printed book.
1) The
TEI and on-line publication
Our
project, as we just mentioned, is based on a digital text using the
TEI format, a standard whose use is not restricted to on-line
publication.
One of our
key initial requirements was to ensure enduring access to our work,
regardless of the evolution of future computer technology and
standards. For this reason we chose to use only open-source software
and free and open standards recommended by the scientific and
technical communities.
Enduring
access is assured first of all because the rules governing the
encoding of the digital text are explicitly laid out in what is
called a DTD, a document established with XML, a normalized
framework. Thus the text is essentially inseparable from a simple set
of standardized operating instructions.
The second
set of requirements were reliability and citability. The use of an
image mode (which I’ll show you shortly) is the ultimate
guarantee of the first of these. By «citability »,
we mean the possibility for scholars to refer to the digital text
just as one would cite a paper version. You can see that our concern
was not to simply reproduce de Scudéry’s novel on the
Internet, but to help the Internet become a new kind of reference
tool that would fit directly into existing academic practices.
Last but
not least, it was crucial to ensure that the text would be read –
that is : to produce a universally accessible and useful version of
the novel. XML allows texts to be easily converted to many different
formats. For example, you can download Le Grand Cyrus from our
website in several different formats, including an "ebook"
version, a Microsoft Word version and a version ready to be printed.
These
requirements lead us to adopt the following technical solutions:
a) We were
able to avoid part of the complexity of the TEI guidelines by using
TEI Lite, a simplified version of the standard that can nevertheless
be extended.
b) The TEI
guidelines are a standard for encoding text, but they cannot be
directly used without specific conversion and presentation tools.
Numerous frameworks exist for presenting XML, and most of them use
the XSLT transformation language. However, we chose to use a
technology known as Xpathscript, which is an extension of mod_perl
for an AxKit platform. We adopted this solution primarily to
simplify the deployment of our web server. XSLT would have been a
technically « cleaner » solution, but would
also have been much more difficult to implement.
Despite the
fact that the TEI consortium provides ready-made XSLT style sheets,
it remains extremely complex to develop applications in XSLT that go
beyond the simple representation of texts. It is worth noting that a
more practical alternative is on the horizon, with the introduction
of native XML processing in the latest version (5.0) of PHP. This
will allow programmers to integrate easier XML texts in dynamic web
pages.
2) The
TEI as a publishing tool
The TEI
guidelines were initially developed as a tool for publishing texts,
and it is in this spirit that we proceeded to use them.
a)
Establishing the text
The
techniques that the TEI proposes for managing different versions of a
text are extremely rich, in particular the tools for transcribing
manuscripts, and the <alt> tag that allows for alternative
versions.
However,
these tools are rather awkward for such a long text. Their use can be
simplified, if the project is limited to the needs of a scholar, at
least for texts that are readable after a simple transliteration from
seventeenth century spelling to modern French.
The
possibility of viewing the text in image mode suffices to eliminate
any ambiguity, provided the transition between modes be fast enough
to permit an instant comparison.
You will
notice that this solution is powerful enough to eliminate the need to
provide a non-transliterated version. It would be possible to
maintain parallel versions, but technically difficult.
b)
Annotation and commentary
The
Artamène website is not intended to be a scholarly edition of
de Scudery’s novel. Anyway on the Internet, annotation creates,
as we know, numerous new problems which are compounded by the use of
XML: the impossibility for contributors to insert their comments
directly into the source file; the fact that, as everybody knows,
scholar contributions often arrive in a slowy manner, and, above all,
the immensity of the task. On the Internet, where we cannot know or
control the reader’s entry point, annotating Le Grand Cyrus
would require systematic notes, at least for the characters and
settings.
Instead, to
allow the development of critical discourse around Le Grand Cyrus,
we opted for an external tool, based on the idea of the «wiki »,
which is currently the easiest way to create spontaneous
encyclopedias of related notions, capable of representing overlapping
ideas within a given conceptual universe. Our «encyclopedia of
the world of Cyrus » allows free collaboration for a
closed list of registered contributors. The entries of this
encyclopedia are automatically furnished with lists of words and
notions and can later be inserted into the main text. The XML
presentation system then allows these notes to appear as
text-bubbles, a technique that makes for a convenient transition
between the text and its commentaries, without
interrupting the underlying visual coherence.
3) The
TEI as a reading tool
The TEI
guidelines enforce a separation between the text and its visual
presentation. This separation enables a potentially infinite variety
of presentation formats depending on the intended public or the
desired use: discovery, reading, printing, comparison between the
digital text and the image of the original printed book.
a)
Navigation and reading
By giving
the text a logical tree structure and enabling an easy segmentation
that follows the different modes of enunciation and the choices of
genre, XML is particularly useful for manipulating gigantic texts
like Le Grand Cyrus. XML allows the reader to visualize and
manipulate complex vertical narrative structures that are quite
different from the chapter divisions of the modern novel. It is
possible to conceive of a complex system of tags that mixes, in a
hierarchy of <div> tags, real textual entities and arbitrary
divisions. The only limit to this is approach is inherent to the
markup rules of XML, that require a rigorous hierarchical structure
in which the different branches of the tree cannot cross one another.
A
navigation system that «unfolds » the leaves of the
XML tree shows the power and the possibilities offered by XML. XML
allows us to replace huge textual elements with brief summaries
(there are three levels of summaries), thus affording us a
comprehensive view and easy orientation in an otherwise
inextricable textual jungle.
As we can
see, the combination of the TEI and a navigation interface designed
by and for scholars can result in colossal gains in ease of reading
while overcoming the obstacles presented by such an immense text.
4) The
TEI as a tool for textual analysis
Since each
textual element is «encapsulated » within a pair of
XML tags, the text can also be used as a database.
a) From
traditional research tools to XML
Faced with
such a massive text, a search engine seems perfectly justified. The
relatively limited number of words and morphological varieties
present in the text excludes any attempt at lemmatization (truncated
forms are used instead, which are more than sufficient). For the same
reason, a contextual search engine is indispensable, so we chose the
Philologic engine, developed at the University of Chicago for the
ARTFL project, a tool that is powerful and reliable, though
specifically designed for the task at hand.
With a
future version of the search engine, we will be able to search for a
word in a particular generic or thematic context. This capability
will become real when database software supports native XML queries.
Now we
would like to briefly consider some of the possibilities offered by
XML that we are currently working on and from which other projects
using the TEI could perhaps benefit from.
b)
Semantization and textual cartography
In the
future, we will implement a series of cartographic tools on our
website that will, for example, produce a graphic rendering of the
location of all the dialogs, of all the occurrences of a given topic
or narrative leitmotiv. A graphical representation using color codes
will also be used in search engine results in order to provide rapid
contextualization.
c) Semantization
and reorganizing the text
Ultimately,
the semantization done with the original coding of the text will
allow us to define topographically distant compositional units. With
these units, the user will be able to reorganize and restructure the
text. We will then be able to create versions of the novel that bring
together all of the dialogs, or all of the texts related to the
“duel” or “jealousy” leitmotiv, etc. And
based on these reorganizations, one could imagine further
manipulations of the original material.
Conclusion
a) From an
archival point of view, the TEI guidelines are an effective tool but
rather difficult to implement. This difficulty is compounded if we
take into account the complexity of XSLT display software and native
XML databases. More traditional solutions – i.e. image files
to refer to the original text, Perl or PHP parsers, or existing
search and indexing tools -- provide a considerable time savings in
development and implementation. The development of a "pure"
XML scholar publishing system would only be feasible if the different
TEI projects were to join together to build it.
b) From a
scientific point of view, the primary innovation to look forward to
is the possibility of treating a text as a database. This type of
procedure, based on the definition of textual units (through the
powerful annotation system authorized by the TEI and through the
structural properties of text markup) will allow us to create much
more refined search queries. The TEI could then be considered not
just as a standard for preserving, presenting and searching static
digital texts, but as a means for dynamically manipulating and
analyzing texts.
Claude Bourqui and Alexandre Gefen, Université de Neuchâtel (traduction anglaise : Joseph Fahey)
|