Standards and Symbols for South African Genealogy
Release 1.14 20 Oct 1997 Compiled by JC van Deventer
Index
A. Definitions
By genealogical programme is understood a minimum requirement to operate genealogy effectively and to produce it at a generally exchangeable and understandable level.
To achieve this goal we have to use genealogy dedicated programmes that maintain the prescribed standards.
Genealogy dedicated programmes are programmes such as PAF, Brothers Keeper, Family Treemaker, Family Origins etc. but not a word processing programme like Word Perfect or MS-Word, nor a database programme such as dBase and others.
The genealogical programme can be a DOS or a WINDOWS or a DOS/WINDOWS based programme provided that it conforms with the guidelines in this report.
By symbols is understood symbols that represent statements or meanings in genealogy, e.g. events, or as parameters to describe the uncertainties that occur with dates.
Symbols provide a way of overcoming language problems and give symbol meanings to those events and parameters that occur repeatedly. Symbols provide simplicity, convenience and compactness to genealogical printouts or documents.
Return to Index
B. Purpose
Because no written guidelines for the standardisation of genealogical computer programmes and symbols, which are recognised by the Society, exist in South Africa, it has become necessary for standards to be laid down which could serve as a guide to prospective as well as established genealogists.
The available computer programmes could be measured and judged according to these guidelines. With the technology and dedicated genealogical programmes now available, the preservation and presentation of genealogical information by manual systems has become obsolete, and switching over to electronic techniques, which ensure the exchange and distribution of data/information, has become urgent.
For this, known standards are needed in order for us to speak a common genealogical language.
A wide variety of genealogical programmes already exist, most of these originate from abroad and many are only available there. The purpose of this investigation is thus to establish guidelines for specifications for standards which are suitable for South African conditions.
Before this can be done the South African requirements must be examined from a computer point of view.
Return to Index
C. Background
The South African genealogical format was already proposed by Christoffel Coetzee de Villiers before his death on 4th September 1887, and is widely used by South Africans (albeit with much criticism by some, but predominantly favourably by most local genealogists) and this format is already so well established that it has become the norm in South African genealogy.
It must be understood that this format at present only has relevance to the Progenitors in South-Africa where genealogical references start with a and thereafter logically and meaningfully proceed to b, c etc. for the generations, and with 1,2,3 etc. for the children by generation where b1, b2 etc. denotes child 1 and child 2 in the second generation.
The rhetorical arguments about the generations before the progenitor must be placed in perspective.
It must be clearly understood that this a-, b-, c-, etc. format is only a relative reference for the start of a family in South Africa. Likewise the progenitor could be referred to before his arrival in South Africa as "c4" in Europe if he was, for example, the fourth child of the third generation which was recorded and then as a2 if he was the second person of a specific surname who settled in South Africa.
Recorded information tells us that in 1538 Henry VII of England ordered all priests to record births, marriages and deaths at churches.
This was initially done on paper which in time was lost.
Only in 1558 did churches begin to keep registers, some of which have been preserved. These were, however, not very complete and Jews and Quakers were excluded from the registration.
In August 1539 an ordinance in France decreed that children had to be registered at birth. In 1597 an ordinance decreed that marriages and deaths had to be registered.
In 1600 only 14% of Dutch males in Amsterdam had surnames.
In 1811 Napoleon forced the Dutch to adopt surnames and for many this was a huge joke.
If we return to the earliest recorded data now known/available we arrive at approximately 1538 - for ease of calculation say 1525.
If one starts here as relative "a" in Europe and commutes an average of 25 years per generation, then by 1700 we have only arrived at "h" and by the year 2150 finally reach "z".
After "z" follow "aa, ab, ac" etc. to zz and thereafter again aaa, aab, aac and so forth.
Return to Index
D. Introduction
With the above-mentioned system now already established and accepted in South Africa it would be ideal to have a genealogical programme which is internationally compatible and where the imported information could be reproduced in the desired South African format.
Various problems stand in the way of achieving our ideal:
South Africans can no longer afford to be technologically independent, and it is therefore a non-negotiable requirement that we also comply genealogically with international standards. Our international negotiation for the establishment of a global standards committee binds us to this statement.
In the light of this, the setting of minimum specifications for South Africa is possible.
Return to Index
E. SA requirements of a genealogical programme
a) Gender
The gender of the individual must be given as a point of departure. Three possibilities can arise, namely:
As the custom of giving daughters male names is fairly common and many first names are common to both genders (i.e. Gerrie, Jean etc.), it would be necessary to add the gender symbol for every individual name in the index.
Proposal: a separate field with at least one character (maximum of 8 characters)
b) Surname
1) Length
Predominantly with people of Dutch descent we find long surnames such as VAN RHEEDE VAN OUDTSHOORN which require a field of 25 characters. JANSEN VAN NIEUWENHUISEN is another example with 24 characters.
It must be emphasised that:
Proposal: minimum of 25 characters per SURNAME
2) Identification
The SURNAME must always and without any doubt be distinct from the given names. Furthermore, a surname used as a given name must be clearly distinguishable from the surname, e.g. Olivier Henning or Henning Olivier could be badly confused should the surname not be distinguishable from the given name, unless by symbol or field.
Proposal: separate SURNAME field with SURNAME in capitals in the index and pedigree list.
c) Given Names
1) Length
The length of most long given names is 8 to 12 characters. An average of 10 characters per name is presumed.
2) Number
Seldom, if ever, are more than 6 given names encountered.
Proposal: separate field with a minimum of 60 characters
d) Titles
1) Prefix
Is mostly used to indicate the professional status of the person and the abbreviation appears in front of the name e.g. Dr., Prof. etc.
2) Suffix
Is mostly used to indicate the occupation, profession or status and is used as a single word or phrase after the name e.g. Progenitor, Justice of the Peace, Physician etc.
Proposal: a separate title field with a minimum of 8 characters for the prefix and 16 characters for the suffix
e) Facts (events)
The following facts are customary and consequently necessary as the minimum.
Each of these facts contains a date and place as mentioned later at f) and g) and also a note field and source reference later mentioned at h) and i) to record important or historical information.
A facility must also be provided to record sensitive information that can be displayed on special request but may not be printed.
Proposal: minimum of 6 facts
f) Dates
1) Language:
Because the months are written differently in the various languages, the month must be printed so that it is valid in all languages, i.e. it must only be numeric.
2) Format:
It is preferable and convenient to give this in the normal spoken idiom, in other words 23rd of the 10th of 1938 or actually 23.10.1938 (the accepted European usage where . = rd/th e.g. 8. child means 8th child). Other separation characters for the date such as / and - could also be used. For the exception to the rule where the abbreviated month must be given e.g. 16 Jan 1867, only the following months would be different (for Afrikaans and English) i.e. May/Mei, Oct/Okt and Dec/Des.
3) Uncertain:
Because many dates are either incomplete or not recorded, it is important to make provision for their recording. Dates of birth are the most important date for any individual and can be regarded as the "footprint" of each person.
It is also necessary to distinguish people with similar family names from each other by using a date in the index.
It is therefore important that a date of birth be indicated as an approximation when this date is unknown or uncertain. The following variations exist for about:
abt 1756 or abt 10.1756 or abt 13.10.1756* or abt 13 October 1756 (*abt 10.13.1756 in the case of US programmes) where the date is uncertain or, calculated 13 September 1756, as an example of the maximum length which could occur as almost all international genealogical programmes provide this full length option as a possibility.
Proposal: format as dd mmmm... yyyy with minimum length of 17 characters per date plus 10 characters (9 characters for calculate plus one space) before the date (calculate 13 September 1756 as an example)in other words a minimum of 27 characters per date field per fact.
g) Place Fields (additional to dates)
This field must be sufficient to contain the full description of the place. If a church, parish, town, province and country were each calculated as 12 characters, 60 characters would be required. These place recordings must be alphabetically stored in the place list where they can be edited.
The existing records must be pre-displayed to the user one after another upon typing each character.
Proposal: minimum of 60 characters per field.
h) Sources
References to sources of the input data, dates or notes must as far as possible be complete (with page references) for each individual as general source references, and specific source references for each fact or statement.
Sources already listed must be stored in a sorted list without duplication, for convenient later referral.
Proposal: a separate free form field with an unduplicated, sorted reference list of existing titles and authors.
i) Notes
A field which can accommodate all notes, information and descriptions, is required.
A so-called free field (available as required) is specified as a requirement.
Symbols must be avoided as far as possible in the note fields and words and abbreviations must preferably be used.
Repetition of statements which were recorded in the facts field should be refrained from - additional information should, however, be recorded as explanation of these facts.
Diacritic characters must be freely available for use and could supplement the SURNAME and given names where these had to be omitted in the fields assigned to them to allow for correct sorting.
Apart from notes for each fact, there must be a general note and source that references the individual (person).
Proposal: a separate note field with unlimited length i.e. free form field
j) Address
It is convenient to have an address field where the physical address, postal address, telephone number and e-mail address and other related address information of every individual can be entered as well as the date of every such entry.
Proposal: a separate free field per individual.
k) Editing
It must be possible for all the above-mentioned items to be edited.
Individuals must be linked and unlinked from and to a family, children and marriages to correct errors.
The sequence of children and marriages must be rearranged if necessary.
Records must be created and deleted when required.
Data must be cut, copy and paste when necessary.
In the event of an accidental error, recalling of just erased data should be possible (undo).
l) Computer reference
Each programme uses a computer index number or a computer record referred to as Rin (PAF) or Rec.No.
As this number is of utmost importance to the researcher, mainly to distinguish between similar family names, this number should be displayed on almost every working screen.
There should also be the choice of printing the number with each name.
A facility should be available to call up any person directly with his record number.
m) Exchange Communication
For the ease of data exchange between different programmes, this programme must satisfy the genealogical data exchange communication criteria namely GEDCOM (Genealogical Data Communication) standard 5.5, at least for the facts set out in Ee).
All GEDCOM programmes before release 5.5 will also be valid for the specific events mentioned above.
n) Utilities
A genealogical programme should at least provide the facility to
o) Index references
Every programme must have a pedigree register and a proper alphabetically sorted index, sorted as follows
This sorting is very important to distinguish those sharing a SURNAME from each other by means of their different dates of birth. All diacritic characters (ASII 0128 to 0255) that appear with surnames and given names must be used freely and must be sorted according to the known ISO standard.
Valuable information can be added to the list of the above-mentioned data, e.g. the gender and the birth date to ease identification of persons in the index list.
A simple reference to the parents and to the children of each entry in the pedigree list is regarded as important and convenient.
A simple but fast index reference that can easily be remembered and created, is important and convenient for cross referencing the various lists.
An index number referencing the names in the index list with the pedigree list not longer than 6 numeric characters is suggested. This number could for example consist of a page number and a page line number, i.e. 235/28 in the pedigree list.
It could also be a running number, e.g. 4618, that refers to the number of the entry in the pedigree list which is independent of page numbers.
Proposal: maximum of 6 digits.
Return to Index
F. Symbols
South Africa is one of the few countries to use only symbols to indicate the most common facts and date uncertainties in genealogy. This usage is already so popular and well established in South African genealogy that the use thereof is indispensable.
An important prerequisite for the use of symbols is that there must be standardisation. No alternatives or an either/or condition can be tolerated. Symbols should, however, not be confused with abbreviations as mentioned under G.
Single symbols are as far as possible preferred because they
These symbols must also be readily available for the users of DOS.
Symbols must be thoroughly defined in each publication.
Proposal:
1) facts (in front of the fact; followed by a space then place, date,)
2) date uncertainty
3) explanatory (after note symbol)
4) concepts (included)
Return to Index
G. Abbreviations
The standard alpha character abbreviations as defined in Heese and Lombard are accepted without question. These abbreviations are for use in the note and place fields, but not as replacement for symbols and must always be properly defined in every publication.
The following are some examples:
H. Genealogical Codes
As mentioned previously, the genealogical code is regarded as a handy, logical and meaningful code with which to show family relationships. The already established way of placing the code to the left of each name e.g. h5, must be retained at least in South Africa.
It is suggested that the complete genealogical code should appear after every individuals name in the index thereby providing additional and valuable information. The code must be calculated and provided by the programme.
For in-laws the appropriate code can be indicated in the note field. If available genealogical programmes, especially those from overseas, do not satisfy the requirements set out in C and D, use should be made of secondary programmes to deliver the desired format results.
Although the SA genealogical format is a very logical and meaningful genealogical number to determine relationships between members of the family, it is a very bad and poor index reference, even if very short, because:
The index reference as mentioned in o) should rather be used for ease, speed and simplicity of tracing a cross reference. The genealogical number is excellent for determining family relationships but must only be used for that purpose.
Return to Index
I. Conclusion
The first concept report was compiled at the request of the GSSAs (Genealogical Society of South Africa) Strategic Meeting of 8/1996 and tabled at the meeting of 3/1997. Comments were invited and the report was recompiled so as to also serve as a working document for the international committee.
This document was discussed and approved by the top management of the GSSA on September 20th, 1997 in Bloemfontein.