austraLasia 839
CSi: The first million word
corpus is complete
ROME: 2nd May '04 -- CSi
stands for Corpus Salesianum (Italian), a collection of texts in digital form
from the Salesian 'magisterium' or teaching authority. The collection has
now reached its first million words, representing approximately 180 individual
texts from the past 25 years. The starting point has been the beginning of
Fr. Vigano's period as Rector Major and includes all of his letters, those of
Fr. Vecchi and Fr. Chavez, and other texts of magisterial nature, such as
General Chapter documents (23,24,25) the renewed Ratio, various guidelines
indicated in the 'Acts' and so forth.
The texts are first converted to plain text, that is, unadored,
unformatted '.txt'. They are then indexed electronically. At this
point it is possible to instantly retrieve any word or phrase in the
collection, but, with further analysis (likewise digital) it is then possible to
identify statistical relationships, for example the mutual relationship scores
(MI) between chosen terms, or the consistency of a term across the
entire corpus.
It may be of interest to know that the ten most frequent terms, numerically
and consistently, of Salesian discourse in the post-Vatican II period as
represented by these texts are, in order: life, Don Bosco, young people,
God, community, Church, Spirit, Christ, formation,
confreres/Salesians. These are then followed by mission, faith,
family, congregation, experience. Of course, mere numbers are
insufficient. We need to know what collocates with what. It helps to
know that life associates strongly with two concepts in
particular, one that is expressed in terms of
consecration-religious-Salesian-apostolic; the other with life's
daily-sociocultural-fraternal-communitarian quality. Don
Bosco also collocates with two main conceptual groupings:
charism-Salesian-spirit-preventive system-mission-Salesian
Family-heart-pedagogy on the one hand (words to the left of DB) and
young people-history-father and teacher-place(Valdocco,Rome..)-mother
on the other (words to the right). The corpus also demonstrates, and this
is most helpful to a deeper understanding of certain Salesian magisterial
rhythms, the key meaningful word clusters around each word or concept. If
it is God that Salesians speak about, then we know that they most
frequently, and again in order, speak of God's love for the young',
the people of God, the Word of God and our
relationships with God.
CSi will continue to expand. Of all the
corpora possible within Salesian discourse, it is the most important because of
its association with the original and official language of the
Congregation. But plans are afoot to develop CSe,and then, hopefully, CSs, CSf and CSp. At this point the corpora of the Congregation's
major language groups become of immense value to translators and offer the
possible of much greater consistency. Perhaps it is no accident that CS is
also a designation for 'comunicazione sociale', the Social Communications
Department out of which this work has grown.
____________________________________________
'austraLasia' is an email service for the Salesian Family of
Asia-Pacific. If you wish to add to or be removed from this list please contact
jbfox@sdb.org . Back issues of austraLasia
are available on www.bosconet.aust.com . Consider also
the possibility of contributing to Lexisdb