1148 SIP: Not a new province, but 'Come over to Macedonia' and try it out!
austra L asia 1148

SIP: Not a new province, but 'Come over to Macedonia' and try it out!

ROME: 30th May 2005 --  Given the penchant for three-letter codes  in the Salesian Year Book, SIP could well be a candidate for a new circumscription or province along with EST, SLK, SMA, and so forth.  But it isn't.  If you have tried looking for a book on Amazon.com recently you will have come across their new technique of Statistically Improbable Phrases aka SIP.  Linguists have been using this technique for years, without the fancy term.  We simply call them 'key words/phrases'.
    But full marks to Amazon.  They have highlighted something that we can make extensive local use of, if we read Salesian documents like the Acts of the General Council, aka AGC, or any other document for that matter.  An example might help.
    The most recent letter of the Rector Major in AGC 389 is entitled 'Come over to Macedonia and help us'.  It is a longish letter, mostly about the situation of the  Europe North Salesian region.  In precise terms it contains 18,263 tokens (running words) and 3,653 actual types (distinct words).  This means that many words like 'of', 'the' 'a', to take obvious examples, are repeated many times.  But the 3,653 'types' gives us a clue to the actual vocabulary employed by the RM, since they are mostly nouns and verbs. What Amazon would do, and what can be done by anyone else with the right software, is to look at how the words cluster together, see how many of those clusters are repeated in that text, then compare them with the same clusters in a range of other similar texts.
    It is a useful exercise.  If we do it for 'Come over to Macedonia' we discover that the key phrases are, in order of keyness: the Salesians, the young, Czech Republic, Belgium North, the confreres, Salesian work, Great Britain, in Poland, East Circumscription, the Europe Region, Europe North, initial formation, German language.
    How is this 'keyness' arrived at?  It is not guesswork, but statistical.  In this case a phrase is 'key' when (1) it appears at least as many times as specified; 10, in this instance (2) its frequency in this text is compared with its frequency in a reference corpus, in this case  all letters of Rector Majors since Fr Vigano, amounting to 83 or thereabouts.  The two frequencies are cross-tabulated using classic chi-square tests of significance.
    What have we discovered in this simple case of the latest letter of the Rector Major?  That the the Salesians/the confreres, the young and Salesian work are indicative of phrases that not only appear in the most recent letter, but also in many of the other 82 letters.  No real surprise there.  But note initial formation.  This too is statistically significant. both for AGC389  and something of a guide to importance in the other 82 letters.  Further examination would also tell us what kind of statistical importance can be attached to mention of Czech Republic, Belgium North etc.  All these phrases are along the lines of what Amazon would call SIPs or statistically improbable phrases in the context of this literature.
    Of course the next step would be to wish for a Google-like search engine on sdb.org that would quickly and easily find these terms in a range of documents.  sdb.org can find them, but following them in a useful way is not so easy.  However, know that SELECT, available on sdb.org, was based on this procedure.  Its terms include the most important SIPS in the Salesian magisterium.  If you haven't had a look at it yet, do so by going to resources, then animation notes.
VOCABULARY
penchant: a liking for something
circumscription: canonically any defined district, province; in Salesian terms an area under the jurisdiction of the RM or his Vicar.
aka: acronym or abbreviation for 'also known as'.
corpus:  a body of texts
____________________
AustraLasia is an email service for the Salesian Family of Asia Pacific.  It also functions as an agency for ANS based in Rome.  For RSS feeds, subscribe to www.bosconet.aust.com/rssala.xml.  If you subscribe, email this information and your name will come off the regular email list.  RSS eliminates problems such as multiple mailings, viruses, email bloat.  Think about it!