ISO Character Entity Sets

Last Revised: Thursday, December 2, 1999

I'm investigating how the HTML i18n protocol (IETF RFC 2070), Unicode, and the current ISO entity sets may work together in SGML, HTML and XML. These ISO entity sets are referenced but not instantiated in the SHML 1.0 DTD draft as delivered.

Rick Jelliffe has an explanation of charent usage on the oasis-open.org site.

Following is a list of available ISO character entity sets. If you happen to know of the whereabouts of a missing set, let me know (or send it to me) and I'll post it here. The .ent files are CDATA numeric character references; .gml are SDATA 'square bracketed' strings; .pen are XML-compatible Unicode numeric character references (thanks to Rick Jelliffe of Allette Systems).

FilesFPI/Description
iso-lat1.ent
iso-lat1.gml
ISOlat1.pen
"ISO 8879-1986//ENTITIES Added Latin 1//EN"
"ISO 8879-1986//ENTITIES Added Latin 1//EN//XML"
Latin 1 covers most West European languages such as Albanian, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, German, Galician, Irish, Icelandic, Italian, Norwegian, Portuguese, Spanish, and Swedish. The lack of the ligatures Dutch ij, French oe and ,,German`` quotation marks is somewhat tolerable. This entity set is included in the HTML 2.0 specification.
iso-lat2.ent
iso-lat2.gml
ISOlat2.pen
"ISO 8879-1986//ENTITIES Added Latin 2//EN"
"ISO 8879-1986//ENTITIES Added Latin 2//EN//XML"
Latin 2 works for most Latin-written Slavic and Central European languages: Czech, German, Hungarian, Polish, Rumanian, Croatian, Slovak, Slovene.
ISO-8859-3 ISO 8859 Latin 3: Latin 3 is popular with authors of Esperanto, Galician, Maltese, and Turkish.
ISO-8859-4 ISO 8859 Latin 4: Latin 4 introduces letters for Estonian, Latvian, and Lithuanian. It is an incomplete predecessor of Latin 6.
ISO-8859-10 ISO Latin 6: Latin 6 adds the last Inuit (Greenlandic) and Sami (Lappish) letters that were missing in Latin 4 to cover the entire Nordic area. RFC 1345 listed a preliminary and different `latin6'. Skolt Sami still needs a few more accents than these.
iso8859-6.map
teiarb.gml
ISO 8859-6 Arabic character mapping table
"-//TEI TR1 W4:1992//ENTITIES Basic Arabic Letters//EN"
Each Arabic letter occurs in four easily predictable forms: initial, medial, final or separate. To make Arabic text legible you'll need a display engine that combines the appropriate glyphs; the fixed font is not an acceptable rendering. For information on Arabic on the Web, try Arabic ISO 8859-6 Web Page links
teicop.gml "-//TEI TR1 W4:1992//ENTITIES Coptic Letters//EN"
This is the TEI Coptic Letter set. Coptic is the Egyptian language written in a modified Greek script.
iso-cyr1.ent
iso-cyr1.gml
"ISO 8879-1986//ENTITIES Russian Cyrillic//EN"
This entity set contains the Cyrillic characters used in the Russian language.
iso-cyr2.ent
iso-cyr2.gml
"ISO 8879-1986//ENTITIES Non-Russian Cyrillic//EN"
With these non-Russian Cyrillic letters you can type Bulgarian, Byelorussian, Macedonian, Russian, Serbian and Ukrainian. But Ukrainians read the letter ghe with downstroke as heh and would need a ghe with upstroke to write a correct ghe. Stalin's officials tried to abolish this distinction.
iso-grk1.ent
iso-grk1.gml
ISOgrk1.pen
"ISO 8879-1986//ENTITIES Greek Letters//EN"
This is a set of modern Greek letters for use as language characters. Technical use of Greek letters (as in formulae) are described in the Technical set below.
iso-grk2.ent
iso-grk2.gml
ISOgrk2.pen
"ISO 8879-1986//ENTITIES Monotoniko Greek//EN"
This contains additional characters needed for Monotoniko Greek.
ISOgrk3.pen
isogrk3.gml
"ISO 8879:1986//ENTITIES Greek Symbols//EN//XML"
"ISO 9573-13:1991//ENTITIES Greek Symbols//EN"
ISOgrk4.pen
isogrk4.gml
"ISO 8879:1986//ENTITIES Alternative Greek Symbols//EN//XML"
"ISO 9573-13:1991//ENTITIES Alternative Greek Symbols//EN"
ISO-8859-8 ISO Hebrew: This is Hebrew. Like Arabic it is written from right to left.
ISO-8859-9 ISO Turkish: Latin 5 replaces the rarely needed Icelandic letters in Latin 1 with the Turkish ones.
Math/Technical
iso-dia.ent
iso-dia.gml
ISOdia.pen
"ISO 8879-1986//ENTITIES Diacritical Marks//EN"
"ISO 8879:1986//ENTITIES Diacritical Marks//EN//XML"
iso-num.ent
iso-num.gml
ISOnum.pen
"ISO 8879-1986//ENTITIES Numeric and Special Graphic//EN"
"ISO 8879:1986//ENTITIES Numeric and Special Graphic//EN//XML"
iso-pub.ent
iso-pub.gml
ISOpub.pen
"ISO 8879-1986//ENTITIES Publishing//EN"
"ISO 8879:1986//ENTITIES Publishing//EN//XML"
iso-tech.ent
iso-tech.gml
ISOtech.pen
"ISO 8879-1986//ENTITIES General Technical//EN"
"ISO 8879:1986//ENTITIES General Technical//EN//XML"
isomscr.gml "ISO 9573-13:1991//ENTITIES Math Alphabets: Script//EN"
iso-amsa.ent
iso-amsa.gml
"ISO 8879-1986//ENTITIES Added Math Symbols: Arrow Relations//EN"
iso-amsb.ent
iso-amsb.gml
"ISO 8879-1986//ENTITIES Added Math Symbols: Binary Operators//EN"
iso-amsc.ent
iso-amsc.gml
"ISO 8879-1986//ENTITIES Added Math Symbols: Delimiters//EN"
iso-amsn.ent
iso-amsn.gml
"ISO 8879-1986//ENTITIES Added Math Symbols: Negated Relations//EN"
iso-amso.ent
iso-amso.gml
"ISO 8879-1986//ENTITIES Added Math Symbols: Ordinary//EN"
iso-amsr.ent
iso-amsr.gml
"ISO 8879-1986//ENTITIES Added Math Symbols: Relations//EN"

top


Copyright © 1997 Murray Altheim
Curator: Murray Altheim <altheim@eng.sun.com>
Last Revised: Mon, Sept. 22, 1997