20 SGML Declaration of HTML 4

Note. The total number of codepoints allowed in the document character set of this SGML declaration includes the first 17 planes of [ISO10646] (17 times 65536). This limitation has been made because this number is limited to a length of 8 digits in the current version of the SGML standard. It does not imply any statement about the feasibility of a long-term restriction of characters in UCS to the first 17 planes. Chances are very high that the limitation to 8 digits in SGML will be removed before, and that this specification will be updated before, the first assignment of a character beyond the first 17 planes.

Note. Strictly speaking, ISO Registration Number 177 refers to the original state of [ISO10646] in 1993. Changes since 1993 have been the addition of characters and a one-time operation reallocating a large number of codepoints for Korean Hangul (Amendment 5). Revisions of the HTML 4 specification may update the reference to ISO 10646 to include additional changes.

20.1 SGML Declaration

<!SGML  "ISO 8879:1986 (WWW)"
    --
 SGML Declaration for HyperText Markup Language version HTML 4
 
 With support for the first 17 planes of ISO 10646 and
 increased limits for tag and literal lengths etc.
    --
 
    CHARSET
  BASESET  "ISO Registration Number 177//CHARSET
   ISO/IEC 10646-1:1993 UCS-4 with
   implementation level 3//ESC 2/5 2/15 4/6"
 DESCSET 0       9       UNUSED
9       2       9
11      2       UNUSED
13      1       13
14      18      UNUSED
32      95      32
127     1       UNUSED
128     32      UNUSED
160     55136   160
55296   2048    UNUSED  -- SURROGATES --
57344   1056768 57344

CAPACITY        SGMLREF
        TOTALCAP        150000
        GRPCAP          150000
        ENTCAP          150000

SCOPE    DOCUMENT
SYNTAX
 SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
   17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127
 BASESET  "ISO 646IRV:1991//CHARSET
  International Reference Version
  (IRV)//ESC 2/8 4/2"
 DESCSET  0 128 0
      FUNCTION
 RE            13
 RS            10
 SPACE         32
 TAB SEPCHAR    9
      NAMING   LCNMSTRT ""
 UCNMSTRT ""
 LCNMCHAR ".-_:"
 UCNMCHAR ".-_:"
 NAMECASE GENERAL YES
ENTITY  NO
 DELIM    GENERAL  SGMLREF
 HCRO "&#38;#x" -- 38 is the number for ampersand --
 SHORTREF SGMLREF
 NAMES    SGMLREF
 QUANTITY SGMLREF
 ATTCNT   60      -- increased --
 ATTSPLEN 65536   -- These are the largest values --
 LITLEN   65536   -- permitted in the declaration --
 NAMELEN  65536   -- Avoid fixed limits in actual --
 PILEN    65536   -- implementations of HTML UA's --
 TAGLVL   100
 TAGLEN   65536
 GRPGTCNT 150
 GRPCNT   64

FEATURES
  MINIMIZE
    DATATAG  NO
    OMITTAG  YES
    RANK     NO
    SHORTTAG YES
  LINK
    SIMPLE   NO
    IMPLICIT NO
    EXPLICIT NO
  OTHER
    CONCUR   NO
    SUBDOC   NO
    FORMAL   YES
  APPINFO NONE
>