Unicode and Ancient Greek
Table of Contents
1. Alphabetic Order
The letter α is #x03b1
(hex) 945
(dec), so we get the alphabet by adding 1
up to 969
, ω.
(let (lalala '()) (dotimes (x 25) (push (char-to-string (+ 945 x)) lalala)) lalala) ("ω" "ψ" "χ" "φ" "υ" "τ" "σ" "ς" "ρ" "π" "ο" "ξ" "ν" "μ" "λ" "κ" "ι" "θ" "η" "ζ" "ε" "δ" "γ" "β" "α")
As push
adds the items in the beginning, so reverse
the result.
(setq αλφαβητα (reverse *)) ("α" "β" "γ" "δ" "ε" "ζ" "η" "θ" "ι" "κ" "λ" "μ" "ν" "ξ" "ο" "π" "ρ" "ς" "σ" "τ" "υ" "φ" "χ" "ψ" "ω")
2. Problems with Accents
Vowels with precombined accents have two codepoints: a lower one with @grk{τόνος|tónos}, for Modern, and a higher one with @grk{ὀξεῖα|oxeîa}, for Ancient Greek. Both mean an acute accent, so that is an error. The default is to use only the lower codepoints but, of course, not everyone does that. Here are the offending characters in their lower codepoint and in their higher equivalent.
(setq greek-accents '(("ά" . "ά") ("έ" . "έ") ("ή" . "ή") ("ί" . "ί") ("ό" . "ό") ("ύ" . "ύ") ("ώ" . "ώ") ("Ά" . "Ά") ("Έ" . "Έ") ("Ή" . "Ή") ("Ί" . "Ί") ("Ό" . "Ό") ("Ύ" . "Ύ") ("Ώ" . "Ώ") ("ΐ" . "ΐ") ("ΰ" . "ΰ")))
To query this association list, just do (assoc the-char
greek-accents)
, and it will return the cons with the-char
as the
first member.
To get the correspondent of the-char
, just use (cdr (assoc the-char
greek-accents))
.
To convert a string from using the low codepoints to the higher ones,
’tis best to convert the characters to numbers. That’s done using
(string-to-char "")
. Parsing the whole list:
(setq greek-accents (let (greek-numbers '()) (dolist (this-cons greek-accents) (push (cons (string-to-char (car this-cons)) (string-to-char (cdr this-cons))) greek-numbers)) greek-numbers)) ((944 . 8163) (912 . 8147) (911 . 8187) (910 . 8171) (908 . 8185) (906 . 8155) (905 . 8139) (904 . 8137) (902 . 8123) (974 . 8061) (973 . 8059) (972 . 8057) (943 . 8055) (942 . 8053) (941 . 8051) (940 . 8049))
Now, to replace/substitute the characters in greek-accents
to their
higher codepoint character:
(defun greek-make-higher (input) (let ((output '())) (dolist (this-char (string-to-list input)) (if (assoc this-char greek-accents) (push (cdr (assoc this-char greek-accents)) output) (push this-char output))) (reverse output)))
This gives us a very intuitive list of characters, like
(955 8051 947 969)
. To transform that back to a string:
;; but cf. https://stackoverflow.com/questions/18979300 (defun list-to-string (input) (let ((output "")) (dolist (this-char input) (setq output (concatenate 'string output (char-to-string this-char)))) output))
Mending greek-make-higher
to use this function:
(defun greek-make-higher (input) (let ((output '())) (dolist (this-char (string-to-list input)) (if (assoc this-char greek-accents) (push (cdr (assoc this-char greek-accents)) output) (push this-char output))) (list-to-string (reverse output))))
Thus, if we throw λέγω (955 941 947 969)
at greek-make-higher
,
it throws λέγω (955 8051 947 969)
back at us.
3. Beta Code
Beta Code is a way of encoding Ancient Greek using ASCII characters. The original encoding used uppercase letters only, and marked the actual majuscules with an asterisk. That suggest an encoding like the one used in punch cards, quite early for the computer timescale. And somehow that has something to do with my printer. How small this world is!
4. unicode->betacode-char
(defun unicode->betacode-char (chr) (let ((maybe-replacement (assoc chr greek-tonos->oxeia))) (if maybe-replacement (setq chr (cdr maybe-replacement))) (let ((maybe-substitute (car (rassoc chr betacode->unicode)))) (if maybe-substitute maybe-substitute (char-to-string chr)))))
If it cannot find a replacement for a character, it spits the character out as it came.
(setq teste "Μῆνιν ἄειδε, θεά, Πηληϊάδεω Ἀχιλῆος") (mapcar #'unicode->betacode-char (coerce teste 'list)) => ("*m" "h=" "n" "i" "n" " " "a)/" "e" "i" "d" "e" "," " " "q" "e" "a/" "," " " "*p" "h" "l" "h" "i+" "a/" "d" "e" "w" " " "*)a" "x" "i" "l" "h=" "o" "s2") ;; this is news to me (mapconcat #'unicode->betacode-char (coerce teste 'list) "") => "*mh=nin a)/eide, qea/, *phlhi+a/dew *)axilh=os2"
I hope Beta Code will be a handy intermediary representation of
Ancient Greek, since the characters are already decomposed: an ἄ for
instance is represented as a)/
, an alpha (a
) with smooth breathing
()
) and acute accent (/
). Thus, if need be to convert either to
precomposed or decomposed unicode, that is already done, and in an
easily readable way, quite unlike unicode itself.
(mapconcat #'unicode->betacode-char (coerce "ὕβρις" 'list) "") => "u(/bris2"
Now, to transliterate to anything human-readable, it should be just a matter of pruning out the excesses.
(setq testes "αἰτία βασιλεύς γίγνομαι δῶρον εἶδος Ζεύς ἡδύς θεός ἰδεῖν κέρδος λαός μοῖρα νοῦς ξένος ὁμιλία πίνω ἐρημία ῥόδον ποίησις τίκτω ὕβρις φίλος χάρις ψυχή ὠμός. ἄγγελος ἀνάγκη ἄγχω σφίγξ. ἀγορᾷ κεφαλῇ λύκῳ ᾠδῇ") ;; sigmas (replace-regexp-in-string "[123]" "" *) ;; rough breathing: (replace-regexp-in-string "([aehiowu]\\)(" "h\\1" *) ;; soft breathing: (replace-regexp-in-string ")" "" *) ;; acute accent: (replace-regexp-in-string "/" (string 769) *) ;; grave accent: (replace-regexp-in-string "\\" (string 768) *) ;; diaeresis: (replace-regexp-in-string "+" (string 776) *) ;; circumflex: (replace-regexp-in-string "=" (string 770) *) ;; the rest (replace-regexp-in-string "h" "ē" *) (replace-regexp-in-string "f" "pʰ" *) (replace-regexp-in-string "q" "tʰ" *) (replace-regexp-in-string "x" "kʰ" *) (replace-regexp-in-string "w" "ō" *) => "aitía basileús gígnomai dō̂ron eîdos Zeús hēdýs tʰeós ideîn kérdos laós moîra noûs xénos homilía pínō erēmía rʰódon poíēsis tíktō hýbris pʰílos kʰáris psykʰḗ ōmós. ángelos anánkē ánkʰō spʰínx. agorâi kepʰalē̂i lýkōi ōidē̂i"