What does "normalised" mean?
For the Leipzig co-occurrence analysis the machine-readable CAMENA texts were fed into a database. For the time being the character strings stored in the database have been standardised according to the following rules for replacing certain letters:
- majuscule → minuscule
- á à â ä → a
- é è ê ë → e
- í ì î ï → i
- j → i
- ji ij → ii
- ó ò ô ö → o
- ß → ss
- ú ù û ü → u
- v → u
- vu uv vv w → uu
- In the case of Greek words in BetaCode, accent and breathing marks [ / \ = ( ) ] have been removed, as have the signs for diaeresis and iota subscript [ + | ]. In general the same applies to the differentiation between medial sigma and final sigma [ s1 s2 ].
Example: bu/ssos2 or bu/s1s1os2 → bussos
However, in the coding of sigma there are, unfortunately, (still) some inconsistencies. We therefore recommend always using alternative search terms indicating the differentiation between medial sigma and final sigma.
Example: bu/s1s1os2 → bus1s1os2
Please note: When choosing search terms within the Leipzig CO-OCCURRENCE analysis users should apply these rules independently.
Caveat Lector: This standardisation is experimental in nature. It does not take account of the orthographic peculiarities of text elements in other languages (German, French, Italian, English, Spanish, Hebrew, Arabic, etc.) – with the exception of Greek (in Beta Code, see above).
On the subject of standardisation of the machine-readable CAMENA texts, see also Regeln für die Abschrift und Auszeichnung der CAMENA-Texte (rules for transcription and markup of the CAMENA texts).