Augmented Metadata in XHTML

Neocortext.Net Working Draft 10 May 2002

Neocortext.Net
Latest version:
http://www.altheim.com/specs/meta/NOTE-xhtml-augmeta.html
Previous version:
http://www.neocortext.net/specs/meta/NOTE-xhtml-augmeta-20010605.html
Editors:
Murray Altheim, Knowledge Media Institute <m.altheim@open.ac.uk>
Sean B. Palmer, <sean@mysterylights.com>
Revision:
$Id: NOTE-xhtml-augmeta.html,v 1.15 2002/05/10 14:56:58 altheim Exp $

Abstract

This specification describes several minor syntax modifications to XHTML (the XML transformation of HTML) which provide much of the essential functionality required to augment Web pages with metadata as found in published descriptions of the Semantic Web. This augmentation allows Dublin Core metadata, a highly popular standard developed by the library community to be incorporated in Web pages in a way that is compatible with today's Web browsers, and describes a generalized mechanism by which other popular schemas can be used in similar fashion. The metadata can be associated with any XHTML or XML document or document fragment (actually, any addressable resource), internal or external to the document.

Status of this Document

This document is intended for review and comment by interested parties. It is a “work in progress,” currently has no formal status, and its publication should not be construed as endorsement by any corporate or academic body. This document may be updated, replaced, rendered obsolete by other documents, or removed from circulation at any time. It is inappropriate to use this document as reference material, or cite it as anything other than a “work in progress.” Distribution of this document is unlimited.


Table of Contents


1. Introduction

From HTML 2.0 in 1995 [HTML2], HTML 4.0 in 1999 [HTML4], through XHTML 1.1 in 2001 [XHTML11], there's been one method of including metadata within a Web document, the <meta> element. Notably, the <meta> element contains metadata regarding the entire document, and does not allow for metadata annotation of document components or provide a robust mechanism for referencing existing classification schemes, taxonomies or ontologies.

This specification describes three minor modifications to XHTML 1.1 which provide much of the essential functionality required to augment Web pages with schema-characterized metadata, as according to the need expressed in published descriptions of the Semantic Web [W3CSW] [SCIAM]. Using the extensibility provided by the W3C Recommendation Modularization of XHTML [XHTMLMOD], this specification includes an “XHTML Augmented Metadata 1.0 DTD” that implements these features.

The first two modifications are relatively trivial, in terms of implementation:

The third modification is to:

The Dublin Core Metadata Initiative (DCMI) has provided a specification describing a method of including Dublin Core metadata as attribute values in HTML <meta> elements [DCQ-HTML]. This method becomes even more valuable with the modifications this specification provides. Because of the isomorphism between Dublin Core embedded in <meta> and its expression in RDF, harvesting of and transformation between these formats (e.g., via an XSLT stylesheet) is possible. While not normative to this specification, such a stylesheet is planned/included.

This specification provides a brief introduction to the Dublin Core Metadata Element Set, as well as examples and suggestions for use. It also describes the minor changes required to be compatible with XLink (e.g., for use in XHTML 2.0), and revealing Dublin Core's lineage in ISO 11179 Specification and Standardization of Data Elements [ISO11179] demonstrates how this specification may be extended for use with any ISO 11179-based metadata set, i.e., generalized for use with any XML markup language.

Finally, Dublin Core content may be qualified using a set of standardized values [DCMES-Qualifiers], such as stating from which classification system (e.g., Library of Congress Subject Heading or Dewey Decimal Classification) its "subject" is derived. This specification shows how these qualifiers may be extended to allow for other classification systems, such as references into RDF or XML Topic Map [XTM] based ontologies, and provides examples of such use.

The rationale behind this document is to serve as both specification and tutorial. Rather than separate the pedagogical material and examples, these are interspersed among the sections being explained. Notes appearing in this specification (which are offset left and displayed in a smaller font, as following this paragraph) are informative, and are not considered essential to the understanding or use of the features described herein.

NOTE: In this specification, Dublin Core may sometimes be abbreviated to “DC”.

ED. NOTE: "ISSUE:" or "ED.NOTE:" statements such as this will not appear in the final document. NOTEs will likely be eliminated, brought into the main text, or at least edited for length. Also, there are more examples provided currently than planned for the final draft, to assist in discussion of various issues.

1.1 What is Metadata?

In the library community, a trip to the library catalog is an exercise in browsing metadata. And to a great degree, isn't this what's missing on the Web? Keyword-based searches either turn up nothing or a thousand “hits” (with its meaning often more in line with a punch in the nose than a hit parade).

In the card catalog, an individual card is a record that references a book, serial, or some other item that can be retrieved (by you, or the librarian, for you). Each record includes metadata about an item in the libary. Either an item without metadata (e.g., a book with no record), or metadata without an item (e.g., a record but no book) is less than interesting, perhaps frustrating. On the Web, this is the page you didn't find.

The term metadata is a compound that differs from “data” by the addition of the Greek word meta, meaning “alongside, with, after, next.” So metadata differs from data in that it never stands alone; it is always data associated with whatever it describes. In the computer world, this means that whatever is being described must in some way itself be addressable (i.e., retrievable by some means, such as by identifier or location). But in another sense, metadata is just data. “Going meta” another level, what is metadata at one level may be simply data at another. After all, the cards and the card catalog itself each have a location, whether physical or electronic.

1.2 Dublin Core Metadata

The Dublin Core metadata standard, arising from a cross-disciplinary group of librarianship, computer science, and other professions organized by the Online Computer Library Center (OCLC) in Dublin, Ohio, is a simple set of fifteen elements used to describe a wide range of networked resources. The Dublin Core Metadata Element Set (DCMES) was designed to allow non-specialists to create simple descriptive records for information resources.

Why is Dublin Core interesting as a metadata standard? While certainly popular within the library community, it has wider application, and for good reason. The community developing the Dublin Core did not invent it from scratch, they based it on an existing ISO standard for defining metadata specifications. Each Dublin Core element is defined using a set of ten attributes from ISO/IEC 11179 Specification and Standardization of Data Elements [ISO11179] a standard for the description of data elements. This helps to improve consistency with other metadata standards based on ISO 11179, such as the OASIS Registry and Repository project.

Note that the ISO 11179 attributes do not appear in documents, but are part of the formal definition of Dublin Core elements. Of the ten attributes, six are common to all Dublin Core elements, and provide such information as the name and version of the Dublin Core standard. The remaining four ISO 11179 attributes for each Dublin Core element are provided in the next section.

It would be pointless to try to surpass the documentation included on the DCMI site. The DCMI Recommendation Using Dublin Core [Using DC] provides an excellent introduction to the Dublin Core, and this document is highly recommended reading. This specification does provide an appendix designed as a short introduction and reference for those creating Dublin Core metadata for use in XHTML documents. See Appendix C: The Dublin Core Metadata Element Set.

2. Terminology

[Definition: The key words must, must not, required, shall, shall not, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [RFC 2119].]

[Definition: error]

A violation of the rules of this specification; results are undefined.

[Definition: metadata]

This specification uses the same definition found in [RDF]:

Metadata is "data about data" (for example, a library catalog is metadata, since it describes publications) or specifically in the context of this specification "data describing Web resources". The distinction between "data" and "metadata" is not an absolute one; it is a distinction created primarily by a particular application, and many times the same resource will be interpreted in both ways simultaneously.

ED. NOTE: If there are any specific terms anyone thinks need calling out into this Terminology section, let me know.

3. Requirements

The requirements that this specification fulfills are broken into two parts, hard and soft, the must-have and the should-have. This section will be suffixed by links to solutions to each issue as within this document.

Hard Requirements

The design must:

Additionally,

Soft Requirements/Issues

The design should:

4. Conformance

Link processing normatively depends on [RFC 2396] (as updated by [RFC 2732]) processing, including character escaping as defined in these RFCs.

It is an error if a document does not adhere to the conformance requirements described in this specification.

It is also an error when metadata content violates the semantics of its schema. Because this is often very difficult to validate, errors of this type may not be discernable by machine processes. Such “semantic” validation is outside the scope of this specification.

In all cases,

  1. metadata inclusions must occur as either attribute or element content in the XHTML <meta> element
  2. an XHTML <link> element for each schema used must be included in the document <head> (see Section 5.5.2 for details)
  3. the document must validate according to the XHTML Augmented Metadata 1.0 DTD provided with this specification

When the Dublin Core Metadata Element Set syntax is used:

  1. the XML Namespace declaration for Dublin Core must be included. Likewise, when the rdf:resource attribute occurs on a Dublin Core element, the RDF Namespace must also be declared (see Section 5.5.2 for details).
  2. when the Dublin Core Metadata Element Set syntax is used, such syntax fragments must validate according to the Dublin Core 1.1 for XHTML Module (see Appendix B) or the DCMES XML DTD [DCMES-XML]. Both declare essentially the same syntax, except the latter enforces a hardwired XML Namespace prefix. For interoperability, this same XML Namespace prefix (i.e., “dc:”) should be used

While not conformant with the above declarations, this specification recommends that when RDF fragments are included in well-formed XHTML documents (as defined in Section 3.1.2 of [XHTML1]), they should wrapped within XHTML <meta> elements.

5. Augmented Metadata in XHTML

This section describes the changes to XHTML 1.1 necessary to augment its metadata features in support of a semantically-richer Web, amounting to three changes:  allowing the existing <meta> element as inline content anywhere in a Web document; extending its ability to reference content by adding an href attribute; and finally, allowing elements from a popular metadata standard (Dublin Core) as its content.

5.1 Metadata in HTML and XHTML

Previous specifications for HTML and XHTML have designed one means of including metadata within a document, the <meta> element. <meta> is an empty element, meaning that its content model is declared EMPTY and contains no child elements or character data. The attributes on the <meta> element have evolved slightly over the years. To its three original attributes in HTML 2.0, HTML 4 added one, XHTML 1.0 added three more, so it now has a total of seven attributes. The latter three below are the ones that most interest us:

Attribute Name Description
xmlns default XML Namespace declaration
lang language code
dir language direction
http-equiv HTTP response header name (see note)
name property name (i.e., metainformation name)
content property value (i.e., associated metainformation content)
scheme refinement of property name or schema of property value

This specification does not alter these definitions. The essense of the <meta> element is its ability to identify document properties as name/value pairs, using its name and content attributes. The scheme is an optional value that associates a specific scheme with a property value, such as identifying the value “0-8047-3723-1” as being an ISBN number:

  <meta scheme="ISBN" name="identifier" content="0-8047-3723-1" />

The ability to associate metadata with existing Web markup may have wide-ranging effects, as such metadata may be used to enrich content for harvesting and search engines, supply meta-information about existing markup practices (such as annotating client-side imagemaps, typing links, link specific content into an ontology, or adding descriptions to images.

NOTE on 'profile':
The HTML 4 Specification indicates a use for the 'profile' attribute on the <head> element to be used to identify “metadata profiles,“ though after over four years little if any software pays any attention to this design. It was perhaps a feature a bit before its time, as it should be noted that a recent (DCMI) Recommendation indicates use of the 'profile' attribute following a method similar to that first outlined in HTML 4. But for the purposes of this specification, the functionality provided by 'profile' is provided by an explicit <link> element. More on this later.

NOTE on 'http-equiv':
<meta> elements may contain an 'http-equiv' attribute in lieu of a 'name' attribute to be used to generate information for HTTP response headers. Given the nature of this functionality, use of <meta> elements containing the 'http-equiv' attribute are undefined by this specification when located outside of the document <head> element, and such usage is not recommended as it is likely to have unpredictable results.

5.2 Allowing <meta> Anywhere

This specification alters the document model of XHTML to include the <meta> element as content wherever inline content has previously been allowed. Any <meta> element thus appearing in the document body is considered metadata about its parent element, with two exceptions. First, the existing use of <meta> within <head> is unchanged:  this is still metadata associated with the entire document. The second, linked metadata, is described in the next section.

Because one of the primary reasons for including metadata in XHTML is to allow for is harvesting and use, when metadata is associated via this parent-child relationship, authors should assist the process of addressing the parent by specifying a unique value for its ID attribute. Absent some means of easily addressing this original resource, processors would have to resort to XPath or other querying methods, which may not be supported in all applications. Due to the many possible applications of this technology, this is a recommendation but not a requirement.

Example 5.2A

This example shows how <meta> elements may be associated with a paragraph by their inclusion as child elements. It also shows how to incorporate Dublin Core Qualifiers to reference well-established classification schemes such as the Library of Congress Subject Headings (LCSH) or Dewey Decimal Classification (DDC). This is described fully in [DCMES-Qualifiers] and [DCQ-HTML].

  <p id="ants">
    <meta name="DC.Title" content="Ants (Hymenoptera:Formicidae)" />
    <meta name="DC.Subject" content="1. Ants.  2. Arizona Desert." />
    <meta name="DC.Subject" scheme="LCSH" content="QL568" />
    <meta name="DC.Subject" scheme="DDC" content="595" />
    There are more than 250 species of ants in Arizona alone. Ants, like bees,
    are social creatures who live in large colonies, all working together for
    the benefit of the group. Colonies may last for 10 to 20 years, though the
    individual worker ant may only live for two months to one year.
  </p>

This expresses both a suitable title for the paragraph as well as unambiguously indicating its subject. Note the presence of the id attribute on <p>, to assist in externally addressing the paragraph once the metadata has been extracted.

Example 5.2B

This example shows a method of typing XHTML links using the Dublin Core type element and the link type scheme defined in Section 6.12 Link types of [HTML4]. Note the presence of the <link> element, required to connect the use of the “DC” attribute namespace with the Dublin Core XML Namespace. This is a practice established by both [HTML4] and [DCMES-HTML].

  <html>
    <head>
      <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
    </head>
  <body>
  ...
  <p>
    <a href="inverts/grasshoppers.html">
      <meta name="DC.type" scheme="HTML4" content="Prev" />
      <meta name="DC.title" content="Previous Chapter" />
      <meta name="DC.language" content="EN" />
      <meta name="DC.description"
            content="Grasshoppers, Walkingsticks, Termites, Bugs and Beetles" />
      <img src="images/prev-arrow.gif" alt="Previous Chapter" />
    </a>
    <a href="inverts/scorpions.html">
      <meta name="DC.type" scheme="HTML4" content="Next" />
      <meta name="DC.title" content="Next Chapter" />
      <meta name="DC.language" content="EN" />
      <meta name="DC.description"
            content="Scorpions, Spiders, Centipedes and Millipedes" />
      <img src="images/next-arrow.gif" alt="Next Chapter" />
    </a>
  </p>

The above method is more expressive than the HTML's rel and rev attributes, as it provides access to the entire semantics of the Dublin Core (including such things as media type, multiple languages, access rights, etc.) in providing link metadata.

Example 5.2C

This is similar to the previous example, but types XHTML links using XLink's role and title attributes. Note again the presence of the <link> element, required to connect the use of the “XLINK” attribute namespace with XLink's XML Namespace.

  <html>
    <head>
      <link rel="schema.XLINK" href="http://www.w3.org/1999/xlink" />
    </head>
  <body>
  ...
  <p>
    <a href="inverts/grasshoppers.html">
      <meta name="XLINK.role" scheme="HTML4" content="Prev" />
      <meta name="XLINK.title" content="Previous Chapter" />
      <img src="images/prev-arrow.gif" alt="Previous Chapter" />
    </a>
    <a href="inverts/scorpions.html">
      <meta name="XLINK.role" scheme="HTML4" content="Next" />
      <meta name="XLINK.title" content="Next Chapter" />
      <img src="images/next-arrow.gif" alt="Next Chapter" />
    </a>
  </p>

Because XLink provides fewer metadata features than Dublin Core, this method is not as expressive as the previous example.

Example 5.2D

ED. NOTE:
Consider dropping this example as perhaps too complex and controversial.

This example shows how a <meta> element may be associated with an anchor element by including it as a child element. By adding a <link> element into the document <head> associating the XLink semantics (this requirement is discussed more fully in Section 5.5.2), XHTML links may be provided the full range of XLink properties. While this specification boasts no ambition to replace XLink, this ability is an interesting byproduct, and perhaps could be part of a transitional strategy. The following example mimics the functionality of the last example in Section 5.1.2 of [XLINK]:

<person
  xlink:href="students/patjones62.xml"
  xlink:label="student62"
  xlink:role="http://www.example.com/linkprops/student"
  xlink:title="Pat Jones" />

<person
  xlink:href="profs/jaysmith7.xml"
  xlink:label="prof7"
  xlink:role="http://www.example.com/linkprops/professor"
  xlink:title="Dr. Jay Smith" />

<course
  xlink:href="courses/cs101.xml"
  xlink:label="CS-101"
  xlink:title="Computer Science 101" />

with its XHTML Augmented Metadata rendition (note the <link> element):

  <html>
  <head>
    <link rel="schema.XLINK" href="http://www.w3.org/1999/xlink" />
  </head>
  <body>
  ...
  <a href="students/patjones62.xml">
    <meta name="XLINK.label" content="student62" />
    <meta name="XLINK.role" content="http://www.example.com/linkprops/student" />
    <meta name="XLINK.title" content="Pat Jones" />
  </a>

  <a href="profs/jaysmith7.xml" />
    <meta name="XLINK.label" content=="prof7" />
    <meta name="XLINK.role" content="http://www.example.com/linkprops/professor" /> 
    <meta name="XLINK.title" content="Dr. Jay Smith" />
  </a>

  <a href="courses/cs101.xml">
    <meta name="XLINK.label" content="CS-101" />
    <meta name="XLINK.title" content="Computer Science 101" />
  </a>

For information on how to associate metadata with empty XHTML elements, see Example 5.3A in the following section.

5.3 Metadata Links Using <meta>

There are certainly times when it is impractical or impossible to include a <meta> child on a specific element, such as when the element is declared "EMPTY" (e.g., <img>), when direct nesting is problematic for processing or display, or when the document cannot be modified (e.g., it's on a CDROM or somebody else's web site). In such cases it becomes necessary to be able to associate this metadata by reference. This is done by adding an href attribute, which when present supercedes the child element association described in the previous section.

The content of the href attribute is a URI reference specifying the location of a Web resource, thus defining a link between the <meta> element and the identified resource. Using this method, it is also possible to include an entire document's metadata within its <head> element.

Example 5.3A

This example shows how to associate metadata with an empty element.

  <img id="ant" alt="Harvester Ant"
      src="http://www.desertmuseum.org/natural_history/inverts/images/harvest.gif" /> 
  <meta href="#ant" name="DC.Title" content="Harvester Ant (Pogonomyrmex spp.)" />
  <meta href="#ant" name="DC.Format" content="image/gif" />

While the <meta> element may be more interoperable with older browsers, there is a tradeoff:  use of this linking feature requires more syntax redundancy, since the link must be repeated for each DC component. A solution to this is found in the next section.

Similarly, the <img> element's required alt attribute is perhaps redundant the Dublin core's dc:description, but the latter is more generally identifiable as a metadata description of the resources that a document author wishes to make explicit as resources (as opposed to all of the assorted images that litter Web pages for presentational or layout purposes).

Likewise, as in Example 5.2A, the advantage of using a Dublin Core title rather than the existing title attribute on an XHTML element is that:

  1. it is easier to make use of such data using Dublin-Core aware tools
  2. the solution is generalized for all XML documents
  3. while similar, the semantics of XHTML's title attribute are not identical with Dublin Core's dc:title, and should not be considered equivalent.

The downside of using the Dublin Core title over XHTML's title attribute is that the latter has been provided as an accessibility solution and advocated by the Web Accessibility Initiative (WAI). But given that it's unlikely that all XML document types will adopt an xhtml:title attribute, the Dublin Core solution is more general. Had XHTML provided a title attribute in a WAI namespace (e.g., <p wai:title="Grasshopper Mouse">), this might have been a better accessibility solution for XML.

Example 5.3B

This example shows how to associate metadata with an external resource.

  <meta href="http://www.desertmuseum.org/natural_history/reptiles/terrapene.html#box" 
     name="DC.Title" content="Western Box Turtle (Terrapene ornata)" />

5.4 Allowing <meta> Content

The fifteen elements of the Dublin Core Metadata Element Set (DCMES) (see Appendix C) have been published as an XML DTD [DCMES-XML]. This same set of elements have been implemented as an XHTML module and included in a document type conforming to the definition of XHTML Host Language Document Type (see Section 3.1 of [XHTMLMOD]).

This specification extends the content model of the XHTML <meta> element (previously declared "EMPTY") to allow for inclusion of the fifteen DCMES elements. To assist in the display of DCMES content (when this is desired), the content model also includes character data (i.e., PCDATA) and the XHTML <br> (forced line break) element.

Just as described previously, the content of a <meta> element is associated with its parent element. When an href attribute is present, the contents of the <meta> element are associated with whatever addressable resource is referenced by the href. Note that by altering the content model of the <meta> to allow Dublin Core content, its previously-required content attribute must be made optional.

ISSUE:
Early prototypes of the augmented DTD did not include mixed content, which has its downsides in any markup design. Do we really want mixed content here? How much do we expect it will be abused? Harvesting of either metadata attributes or namespaced content is not impacted by its presence, but some people may expect raw text descriptions to be processed.

NOTE on <meta> content:
The character data and line break element content of a <meta> element should under no circumstances be considered part of the metadata content, and should be limited to only punctuation and other requirements for display integrity. Processors designed to harvest such metadata are instructed to ignore all non-DCMES element content.

NOTE on hidden metadata:
In such cases, CSS style definitions should be applied to wrapper elements, since in testing with current browsers, style definitions applied directly to <meta> elements or to non-XHTML content do not function in most cases. It is hoped that CSS implementations will improve to support style definitions that hide this content. As early as 1996, the CSS 1 Recommendation stated that “all HTML element types are possible selectors” (Section 1 Basic Concepts of [CSS1]), yet until such time as better CSS support is widespread, a transitional strategy is needed. The greatest interoperability with older browsers is gained using the method using attributes on <meta>, as described in Section 5.2, or using a combination of the linking method (Section 5.3) and a wrapper <div> element whose style definition is "display : none".

NOTE on RDF:
Several questions arise:
Q1:  Why is the Dublin Core content not wrapped by an <rdf:RDF> element?
A1:  The RDF Model and Syntax Specification states that this is “optional if the content can be known to be RDF from the application context.” It is expected that the harvesting of Dublin Core content from a <meta> element would result in it being expressed in DCMES (regardless of whether it was originally DCMES or <meta> attribute content), wrapped in an <rdf:RDF> element whose 'rdf:about' attribute referenced the same thing as the <meta> element from which it was harvested.
Q2:  Why isn't an <rdf:Description> element necessary?
A2:  The <rdf:Description> element (and its 'about' attribute) describe from the RDF perspective to what resource the metadata refers. Since according to this specification, the Dublin Core metadata included in the <meta> element is a metadata resource defined to refer to either the parent of the <meta> element or what its 'href' attribute refers to (when present), the <rdf:Description> element is unnecessary. This is explicitly described from the XHTML perspective.

Example 5.4A

This example shows how to include Dublin Core content in XML as element content of <meta>.

  <p>
    <meta>
      <dc:title>Seed Harvester Ant (Pogonomyrmex spp.)</dc:title>
      <dc:subject>1. Ants.  2. Seed Harvester Ant</dc:subject>
      <dc:description>harvester ant nests</dc:description>
    </meta>
    Harvester Ants don't like rocky soil, preferring creosote flats and bottomland,
    but they may also be found in urban areas. These large, aggressive ants usually 
    make a clearing about 3 feet in diameter around the entrance hole to the nest,
    which is flat. The area is kept clear by the ants, who bite off the stems and
    leaves of plants that try to grow there.
  </p>

Without the CSS implementation support for the hiding of <meta> element content, the above metadata will be displayed. When this is actually desired, it may be helpful to intersperse punctuation and/or <br /> elements among the DCMES elements. Note that this character data is not considered part of the metadata.

Example 5.4B

This example shows how to either hide DCMES metadata by wrapping it in a <div> element whose styling is hidden. By necessity, this must use the linking feature described in Section 5.3, as the parent of the <meta> element is now the <div> element.

  <p id="mousegr">
    <div class="hidden">
      <meta href="#mousegr">
        <dc:title>Southern Grasshopper Mouse, Onychomys torridus</dc:title>
        <dc:format>text/html</dc:format>
        <dc:description>A description of the Southern Grasshopper Mouse</dc:description> 
        <dc:subject>1. Southern Grasshopper Mouse.  2. Carnivorous Mouse.</dc:subject>
      </meta>
    </div>
    The grasshopper mouse is an efficient predator, killing other mice with a bite 
    to the back of the neck, and biting the stingers off scorpions before consuming 
    them. Pinacate beetles emit a toxic spray from their rear ends, deterring most 
    predators, but grasshopper mice catch them and shove the defensive ends of the 
    beetles into the sand, then bite off the good parts, leaving beetle bottoms 
    embedded in the sand.
  </p>

To hide the metadata content, the CSS stylesheet for the above <div> element would be:

  .hidden {
    display : none
  }

This same method can be employed to style such content for display.

5.5 Implementation

5.5.1 XHTML DTD Changes

The changes to the XHTML 1.1 DTD described in the below sections are implemented involving four files:

  1. the XHTML Augmented Metadata 1.0 DTD ("xhtml-augmeta10.dtd")
  2. the XHTML Qualified Names for DCMES 1.1 Module ("dcmes-qname-1.mod")
  3. the XHTML DCMES Module ("xhtml-dcmes-1.mod")
  4. the addition of these three lines to the XHTML catalog file ("augmeta.cat", when a catalog file is used):
    PUBLIC "-//neocortext.net//DTD XHTML Augmented Metadata 1.0//EN"             "xhtml-augmeta10.dtd"
    PUBLIC "-//neocortext.net//ENTITIES XHTML DCMES 1.1 Qualified Names 1.0//EN" "dcmes-qname-1.mod"
    PUBLIC "-//neocortext.net//ELEMENTS XHTML Dublin Core Elements 1.0//EN"      "xhtml-dcmes-1.mod"
    

5.5.2 XHTML Document Changes

Several changes are necessary to XHTML documents in order to correctly validate and process DC-augmented content when using Dublin Core elements.

  1. If a document only expresses its metadata using only the <meta> element, no XML Namespace declarations other than XHTML are required. However, if it contains Dublin Core elements as metadata (e.g., <dc:title>), then the <html> element must declare the XML Namespaces for Dublin Core and RDF. See the example below.
  2. A <link> element must be added to the document's <head> to provide a link between the Dublin Core schema (i.e., a description of the syntax and associated semantics, not an XML Schema), plus any others used in the document. These changes are shown in the following example.

Example 5.5.2A

This example shows the addition of XML namespace declarations for DC and RDF, as well as the link to the DC schema.

  <?xml version="1.0"?>
  <!DOCTYPE html PUBLIC "-//neocortext.net//DTD XHTML Augmented Metadata 1.0//EN" 
                        "xhtml-augmeta10.dtd">
  <html xmlns="http://www.w3.org/1999/xhtml"
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        xml:lang="en">
  <head>
    <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
    <title>Natural History of the Sonoran Desert</title>
  </head>
  <body>
    ...
  </body>
  </html>

5.5.3 Harvesting Metadata

ED. NOTE:
describe harvesting; isomorphism between meta and DC elements; show example.

The word “harvest” is chosen deliberately over “mining,” as the common metaphor of “data mining” implies digging deep (perhaps in a database) for information, whereas the Web's content can be harvested by tools that skim its surface. This specification describes various methods for associating metadata with XHTML documents and document components. This section provides an example of how such metadata might be harvested for processing using relatively simple software tools.

...

Example 5.5.3A

This example shows one possible method for harvesting metadata. The below document fragment is from a document whose assumed URI is "http://www.desertmuseum.org/natural_history/inverts/scorpions.html", and contains Dublin Core content, both as attributes on <meta> and as DCMES elements.

  <p id="whip">
    <meta>
      <dc:title>Tailless Whipscorpion (Paraphrynus spp.)</dc:title>
      <dc:subject>1. Scorpions.  2. Tailless Whipscorpions</dc:subject>
    </meta>
    <meta name="DC.Subject" scheme="DDC" content="595" />
    Tailless whipscorpions look at first glance like spiders. The first 
    appendages (pedipalps) are modified for grasping prey, with hook-like 
    projections. The first true pair of legs is modified to serve as 
    "feelers," and are long, delicate, and whip-like, with many fine hairs.
  </p>

In the Xerces XML parser, the DOM method getElementsByTagName("meta") will extract all <meta> elements from the supplied element. If applied to the document root, all metadata contained as either attribute or element content will be returned as a NodeList, from where it can be further processed. Since according to this specification, instances of such metadata as either attributes on or element content within <meta> elements are isomorphic, what remains is simply a conversion to an output expression suitable for its intended application. In the below example, the content from the above example has been extracted, processed into DCMES content, and then wrapped in an RDF element whose rdf:about attribute references the original parent element in the source document.

  <rdf:RDF 
     rdf:about="http://www.desertmuseum.org/natural_history/inverts/scorpions.html#whip"> 
     <dc:title>Tailless Whipscorpion (Paraphrynus spp.)</dc:title>
     <dc:subject>1. Scorpions.  2. Tailless Whipscorpions</dc:subject>
     <dc:subject>595</dc:title>
  </rdf:RDF>

6. Extensibility and Evolution

ED. NOTE:
This section describes how the design may be extended for other schemas, as well as how references may be made into other subject classification systems (e.g., the Cycorp ontology) and technologies (e.g., XML Topic Maps). It also shows evolution to XHTML 2.0 by providing an XLink rendition, as well as generalization to any XML vocabulary. [The section titles are placeholders.]

...

6.1 Extended Schemes

ED. NOTE:
Show how to extend the DC subject qualifier (using an appropriate <link> element) to hook into a different taxonomy than those listed by DCMI...

...

6.2 Referencing XML Topic Maps

6.2.1 Brief Introduction

An XML Topic Map [XTM] is an XML document that can be used to represent the structure and associations (relationships) between information resources used to define topics. Using a XML Topic Map, can represent a set of relationships between subjects, and point out at occurrences of those subjects on the Web. In a sense, this is precisely what “traditional” maps do.

Topic Maps introduce a concept called a Published Subject Indicator (PSI), essentially a URI “published” (i.e., given some measure of public notice and stability) with the purpose of establishing subject identity. This proxy for an abstract subject is called a topic, and in an XTM document is represented by a <topic> element.

PSIs are similar to the concept of the Uniform Resource Name [URN], except that they are intended to identify not only online resources, but those that cannot be referenced directly on a computer system, such as physical objects, events, or locations (e.g., “Part No. 03-876023-51”, “Flight 831”, or “Monument Valley, Utah”), properties, classifications, or concepts (e.g., “Rainy”, “A Biological Species”, or “Business Relationship”). These are considered in Topic Map parlance non-addressable subjects.

While much of the discussion surrounding topic maps has been on how to reference (i.e., “map”) Web content in an XTM document, the reverse has interesting possibilities. One application using XHTML and XTM together might be to associate metadata with a document or document component that references a topic map in order to indicate its subject. This would be very similar to using the Dublin Core <dc:subject> described in Section FIXME, except that a software system could auto-generate a topic map from a set of Web pages provided the author-supplied, explicit subjects. Every web page incorporating this methodology is potentially a participant in a Web-wide, implicit topic map. FIXME this is only true of a XTM.subjectIndicatorRef!

There are three distinct types of references in XTM that are applicable here. Given these three reference types, the values of the <meta> element's name attribute are provided below.

Subject Indicator Reference
Identifier: "XTM.subjectIndicatorRef"
Element Type: <subjectIndicatorRef>
Definition: Provides a URI reference to a resource indicating the subject of whatever resource the <meta> element is associated.
Comment: The referenced resource does not have to be in an XTM document (though it certainly could be), but it should unambiguously identity a subject.
Topic Reference
Identifier: "XTM.topicRef"
Element Type: <topicRef>
Definition: Provides a reference to a <topic> element in an XTM document indicating the subject of whatever resource the <meta> element is associated.
Comment: This is the same as <subjectIndicatorRef> except for the additional referencing constraint.
Resource Reference
Identifier: "XTM.resourceRef"
Element Type: <resourceRef>
Definition: Provides a URI reference to a resource as a topic.
Comment: This should be carefully discerned from the other reference types, in that the resource itself is the subject, not what the resource might contain or indicate by its contents.

Example 6.2A

This example shows how to incorporate XML Topic Map semantics to provide a subject classification. The markup below indicates the subject of the defining instance of a term (wrapped by XHTML's <dfn> element) by including a reference to a <topic> element in a topic map about toads (“toads.xtm”) using a topic reference ("XTM.topicRef"):

  <html>
    <head>
      <title>A Natural History of the Sonoran Desert</title>
      <link rel="schema.XTM" href="http://www.topicmaps.org/xtm/1.0/" />
    </head>
    ...
    <body>
      <p>During summer monsoons, the <dfn id="couch">
        <meta name="XTM.topicRef"
              content="http://www.doctypes.org/sonoran/toads.xtm#toad-spadefoot" /> 
        spadefoot toad</dfn> is well-known for emerging from its subterranean 
      estivation to breed in the temporary ponds created by the heavy runoff. 
      Preying primarily upon beetles, grasshoppers, katydids, ants, spiders, 
      and termites, a spadefoot can consume enough food in one meal to last 
      an entire year!
      </p>

The topic referenced above might appear in an XTM document as something akin to the below fragment. Note that the <topic> element below includes an <occurrence> element that references the location in our example above as well as a <subjectIdentity> reference to the same location. Hence, the topic map and the web page are co-reflexive, the latter serving as both an indicator and occurrence of the subject “Couch's Spadefoot Toad”.

  <topic id="toad-spadefoot">
    <instanceOf><topicRef xlink:href="#species" /></instanceOf>
    <subjectIdentity>
      <subjectIndicatorRef
        xlink:href="http://www.desertmuseum.org/natural_history/reptiles/bufo.html#couch" /> 
    </subjectIdentity>
    <subjectIdentity>
      <subjectIndicatorRef xlink:href="#toad-spadefoot-description" />
    </subjectIdentity>
    <baseName><scope><subjectIndicatorRef xlink:href="language.xtm#en" /></scope>
      <baseNameString>Couch's Spadefoot</baseNameString>
    </baseName>
    <baseName><scope><subjectIndicatorRef xlink:href="language.xtm#en" /></scope>
      <baseNameString>Spadefoot Toad</baseNameString>
    </baseName>
    <baseName><scope><subjectIndicatorRef xlink:href="language.xtm#es" /></scope>
      <baseNameString>sapo con espuelas</baseNameString>
    </baseName>
    <occurrence>
      <resourceData id="toad-spadefoot-description">Couch's Spadefoot, a small
        toad that ranges from southeastern California through southern Arizona
        and southern New Mexico.</resourceData>
    </occurrence>
    <occurrence>
      <subjectIndicatorRef
        xlink:href="http://www.desertmuseum.org/natural_history/reptiles/bufo.html#couch" />
    </occurrence>
  </topic>

6.3 Referencing Online RDF Applications

ED. NOTE:
perhaps hook into DMOZ for a subject classification...

...

6.4 Use With Any XML Document Type

ED. NOTE:
show how this can be generalized for any XML document type...

...

 


Appendices

A. References

A.1 Normative References

[XML]
"Extensible Markup Language (XML) 1.0 (Second Edition)",
W3C Recommendation, Tim Bray, Jean Paoli, C.M. Sperberg-McQueen, and Eve Maler, eds., World Wide Web Consortium, 2000.
See: http://www.w3.org/TR/REC-xml.
[XMLNAMES]
"Namespaces in XML",
W3C Recommendation, T. Bray, D. Hollander, A. Layman, eds., World Wide Web Consortium, 14 January 1999.
See: http://www.w3.org/TR/1999/REC-xml-names-19990114.
[XLINK]
"XML Linking Language (XLink) Version 1.0",
W3C Proposed Recommendation, S. DeRose, E. Maler, D. Orchard, eds., World Wide Web Consortium, 20 December 2000.
See: http://www.w3.org/TR/2000/PR-xlink-20001220.
[XHTMLMOD]
"Modularization of XHTMLtm",
W3C Recommendation, M. Altheim et al., World Wide Web Consortium, 10 April 2001.
See: http://www.w3.org/TR/xhtml-modularization.
[XHTML11]
"XHTMLtm 1.1 - Module-based XHTML",
W3C Recommendation, M. Altheim, S. McCarron, eds., World Wide Web Consortium, FIXME May 2001.
See: http://www.w3.org/TR/xhtml11.
[DCMES 1.1]
"DCES: Dublin Core Metadata Element Set, Version 1.1: Reference Description",
DCMI Recommendation, Dublin Core Metadata Initiative, 2 July 1999.
See: http://dublincore.org/documents/1999/07/02/dces/".
See also: http://www.ietf.org/rfc/rfc2413.txt.
[DCMES-XML]
"An XML Encoding of Simple Dublin Core Metadata",
Dave Beckett, Eric Miller, Dan Brickley, DCMI Proposed Recommendation, Dublin Core Metadata Initiative, 16 January 2001.
See: http://dublincore.org/documents/2001/04/11/dcmes-xml/.
[DCMES-HTML]
"Encoding Dublin Core metadata in HTML",
John A. Kunze, RFC 2731, Internet Engineering Task Force, December 1999.
See: http://www.ietf.org/rfc/rfc2731.txt.
[DCQ-HTML]
"Recording qualified Dublin Core metadata in HTML meta elements",
Cox, Simon, Eric Miller, Andy Powell, DCMI Working Draft, Dublin Core Metadata Initiative, 15 August 2000.
See: http://dublincore.org/documents/dcq-html/.
Also referred to as: http://www.ietf.org/rfc/rfc2731.txt.
[DCMES-Qualifiers]
"Dublin Core Qualifiers",
Leif Andresen, Tom Baker, et al., DCMI Recommendation, Dublin Core Metadata Initiative, 11 July 2000.
See: http://dublincore.org/documents/2000/07/11/dcmes-qualifiers/.
[DCMI-Types]
"DCMI Type Vocabulary",
DCMI Recommendation, Dublin Core Metadata Initiative, 11 July 2000.
See: http://dublincore.org/documents/2000/07/11/dcmi-type-vocabulary/.
[RDF]
"Resource Description Framework (RDF) Model and Syntax Specification",
W3C Recommendation, O. Lassila, R. Swick, eds., World Wide Web Consortium, 22 February 1999.
See: http://www.w3.org/TR/REC-rdf-syntax.
[URI]
"Uniform Resource Identifiers (URI): Generic Syntax",
RFC 2396, T. Berners-Lee, R. Fielding, L. Masinter, Internet Engineering Task Force, August 1998.
See: http://www.ietf.org/rfc/rfc2396.txt. This RFC updates RFC 1738 [URL] and [RFC 1808].
[URL]
"Uniform Resource Locators (URL)",
RFC 1738, T. Berners-Lee, L. Masinter, M. McCahill, Internet Engineering Task Force, December 1994.
See: http://www.ietf.org/rfc/rfc1738.txt.
[RFC 1808]
"Relative Uniform Resource Locators",
RFC 1808, R. Fielding, Internet Engineering Task Force, June 1995.
See: http://www.ietf.org/rfc/rfc1808.txt.
[RFC 2119]
"Key words for use in RFCs to Indicate Requirement Levels",
RFC 2119, Internet Engineering Task Force, 1997.
See: http://www.ietf.org/rfc/rfc2119.txt.
[RFC 2396]
"Uniform Resource Identifiers (URI); Generic Syntax",
T. Berners-Lee, R. Fielding, L. Masinter, RFC 2396, Internet Engineering Task Force, August 1998.
See: http://www.ietf.org/rfc/rfc2396.txt.
[RFC 2732]
"Format for Literal IPv6 Addresses in URL's",
RFC 2732, Internet Engineering Task Force, 1999.
See: http://www.ietf.org/rfc/rfc2732.txt.

A.2 Informative References

[Using DC]
"Using Dublin Core",
Diane Hillman, DCMI Recommendation, Dublin Core Metadata Initiative, 16 July 2000.
See: http://dublincore.org/documents/usageguide/
[W3CSW]
"Semantic Web Activity",
E. Miller, R. Swick, D. Brickley, B. McBride, World Wide Web Consortium, 17 April 2001 (ongoing).
See: http://www.w3.org/2001/sw/.
[SCIAM]
"The Semantic Web",
T. Berners-Lee, J. Hendler, O. Lassila, Scientific American, May 2001.
See online version: http://www.sciam.com/2001/0501issue/0501berners-lee.html.
[HTML2]
"Hypertext Markup Language - 2.0",
RFC 1866, T. Berners-Lee, D. Connolly, Internet Engineering Task Force, November 1995.
See: http://www.ietf.org/rfc/rfc1866.txt.
[HTML4]
"HTML 4.01 Specification",
W3C Recommendation, D. Raggett, A. Le Hors, I. Jacobs, eds., World Wide Web Consortium, 24 December 1999.
See: http://www.w3.org/TR/1999/REC-html401-19991224
[XHTML1]
"XHTML 1.0: The Extensible HyperText Markup Language",
W3C Recommendation, S. Pemberton et al., World Wide Web Consortium, 26 January 2000.
See: http://www.w3.org/TR/2000/REC-xhtml1-20000126
[XTM]
"XML Topic Maps (XTM) 1.0",
TopicMaps.Org Specification, S. Pepper, G. Moore, 23 February 2001.
See: http://www.topicmaps.org/xtm/1.0/.
[ISO11179]
"Specification and Standardization of Data Elements",
ISO/IEC 11179, International Organization for Standardization, June 1994.
See: http://www.sdct.itl.nist.gov/~ftp/l8/11179/11179-3.txt.
See also Part 1: Framework: http://www.sdct.itl.nist.gov/~ftp/l8/11179/11179-1.htm.
[RFC 3023]
"XML Media Types",
RFC 3023, Internet Engineering Task Force, 2001.
See: http://www.ietf.org/rfc/rfc3023.txt.
[CSS1]
"Cascading Style Sheets, level 1",
H. W. Lie, B. Bos, World Wide Web Consortium, 17 December 1996, revised 11 January 1999.
See: http://www.w3.org/TR/1999/REC-CSS1-19990111.
[SHOE]
"SHOE: Simple HTML Ontology Extensions",
Sean Luke and Jeff Heflin, eds., Parallel Understanding Systems Group, Dept. of Computer Science, University of Maryland at College Park, 2001.
See: http://www.cs.umd.edu/projects/plus/SHOE/.)
[Purple]
"Purple",
Eugene Eric Kim, 2001.
See: http://www.eekim.com/software/purple/index.html
[plink]
"plink: purple links",
Murray Altheim, Sun Microsystems, 7 May 2001.
See: http://www.doctypes.org/plink/ [obsolete link - TBA]

B. XHTML Augmented Metadata 1.0 DTD (Normative)

There are three files included as part of the XHTML Augmented Metadata 1.0 DTD, the DTD driver, DCMES Module, and DCMES Qualified Name Module. The SGML Open catalog file changes necessary to support these new components are described in Section 5.4.1 XHTML DTD Changes.

Also available is a distribution that includes all DTD files plus the specification itself, at xhtml-augmeta.tar.gz.

B.1 XHTML Augmented Metadata 1.0 DTD Driver

This is the DTD driver file, available as xhtml-augmeta10.dtd. Note that this DTD is also available in normalized form, i.e., with all entities instantiated (DTD modules included), as xhtml-augmeta10-f.dtd (~165K).


  <!-- ....................................................................... -->
  <!-- XHTML Augmented Metadata 1.0 DTD  ..................................... -->
  <!-- file: xhtml-augmeta10.dtd
  -->

  <!-- XHTML Augmented Metadata 1.0 DTD

       This is an extension of XHTML, a reformulation of HTML as a modular
       XML application. This XHTML 1.1-based DTD augments the metadata features
       of XHTML, including the addition of Dublin Core content in the <meta>
       element, in support of a semantically-rich World Wide Web. It also
       includes declarations for the 'name' attribute on anchors for better
       legacy browser support.

       XHTML Augmented Metadata 1.0 DTD, Copyright 2002, Murray Altheim.
         With the added requirement that this paragraph remain intact, the
         license for distribution and use of this DTD and its accompanying
         documentation is identical to XHTML, as described below.

       The Extensible HyperText Markup Language (XHTML)
       Copyright 1998-2001 World Wide Web Consortium
          (Massachusetts Institute of Technology, Institut National de
           Recherche en Informatique et en Automatique, Keio University).
           All Rights Reserved.

       Permission to use, copy, modify and distribute the XHTML DTD and its
       accompanying documentation for any purpose and without fee is hereby
       granted in perpetuity, provided that the above copyright notice and
       this paragraph appear in all copies.  The copyright holders make no
       representation about the suitability of the DTD for any purpose.

       It is provided "as is" without expressed or implied warranty.

          Author:     Murray M. Altheim <m.altheim@open.ac.uk>
          Revision:   $Id: xhtml-augmeta10.dtd,v 4.1 2001/06/05 09:22:01 altheim Exp $

  -->
  <!-- This is the driver file for version 1.0 of the XHTML Augmented Metadata DTD.

       Please use this formal public identifier to identify it:

           "-//neocortext.net//DTD XHTML Augmented Metadata 1.0//EN"
  -->
  <!ENTITY % XHTML.version  "-//neocortext.net//DTD XHTML Augmented Metadata 1.0//EN" >

  <!-- Use this URI to identify the default namespace:

           "http://www.w3.org/1999/xhtml"

       See the XHTML Qualified Names module ("xhtml-qname-1.mod") for more
       information on the use of namespace prefixes in the DTD.
  -->
  <!ENTITY % DC.prefixed "INCLUDE" >
  <!ENTITY % NS.prefixed "IGNORE" >

  <!-- In addition to the XHTML namespace, use of this DTD requires the
       addition of two namespaces, RDF and Dublin Core. For example, if
       you are using XHTML Augmented Metadata 1.0 directly, use the FPI
       in the DOCTYPE declaration, with the xmlns attributes on the document
       element to identify the default and added namespaces. Also required
       is a <link> element which ties the prefix "DC" to the XML Dublin Core
       schema (gaining its element definitions and their semantics).

         <?xml version="1.0"?>
         <!DOCTYPE html PUBLIC "-//neocortext.net//DTD XHTML Augmented Metadata 1.0//EN" 
                               "xhtml-augmeta10.dtd">
         <html xmlns="http://www.w3.org/1999/xhtml"
               xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
               xmlns:dc="http://purl.org/dc/elements/1.1/"
               xml:lang="en">
         <head>
         <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
         <title>Document Title</title>
         </head>
         ...
         </html>


       Revisions:
       (none)
  -->

  <!-- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: -->

  <!-- declare for inclusion of Dublin Core Qualified Names Module -->
  <!ENTITY % xhtml-qname-extra.mod
       PUBLIC "-//neocortext.net//ENTITIES XHTML DCMES 1.1 Qualified Names 1.0//EN"
              "dcmes-qname-1.mod" >

  <!-- Dublin Core for XHTML Module  ............................... -->
  <!ENTITY % xhtml-dcmes.module "INCLUDE" >
  <![%xhtml-dcmes.module;[
  <!ENTITY % xhtml-model.redecl
       PUBLIC "-//neocortext.net//ELEMENTS XHTML Dublin Core Elements 1.0//EN"
              "xhtml-dcmes-1.mod" >
  ]]>

  <!-- instantiate XHTML 1.1 DTD  .................................. -->
  <!ENTITY % xhtml11.dtd PUBLIC "-//W3C//DTD XHTML 1.1//EN" "xhtml11.dtd" >
  %xhtml11.dtd;

  <!-- Name Identifier Module  ..................................... -->
  <!ENTITY % xhtml-nameident.module "INCLUDE" >
  <![%xhtml-nameident.module;[
  <!ENTITY % xhtml-nameident.mod
       PUBLIC "-//W3C//ELEMENTS XHTML Name Identifier 1.0//EN"
              "xhtml-nameident-1.mod" >
  %xhtml-nameident.mod;]]>

  <?doc type="doctype" role="title" { XHTML Augmented Metadata 1.0 } ?>

  <!-- end of XHTML Augmented Metadata 1.0 DTD  .............................. -->
  <!-- ....................................................................... -->

B.2 Dublin Core 1.1 for XHTML Module

This is the Dublin Core 1.1 module for use with XHTML, available as xhtml-dcmes-1.mod.


  <!-- ...................................................................... -->
  <!-- Dublin Core 1.1 Module for XHTML  .................................... -->
  <!-- file: xhtml-dcmes-1.mod

       This is an extension of XHTML, a reformulation of HTML as
       a modular XML application.
       Copyright 2002 Murray Altheim. All Rights Reserved.
       Revision: $Id: xhtml-dcmes-1.mod,v 4.4 2001/06/12 16:26:28 altheim Exp $

       This DTD module is identified by the PUBLIC and SYSTEM identifiers:

         PUBLIC "-//neocortext.net//ELEMENTS Dublin Core 1.1 for XHTML 1.0//EN"
         SYSTEM "http://www.doctypes.org/specs/meta/xhtml-dcmes-1.mod" (temporary)

       Based on:

         XML DTD 2001-04-11 for Dublin Core Metadata Element Set version 1.1
         Authors:
             Dave Beckett <dave.beckett@bristol.ac.uk>
             Eric Miller  <emiller@oclc.org>
             Dan Brickley <daniel.brickley@bristol.ac.uk>
         Date Issued:  2001-04-11
         See:  http://dublincore.org/documents/2001/04/11/dcmes-xml/dcmes-xml-dtd.dtd
  -->

  <!-- Dublin Core 1.1 Elements for XHTML

         dc:title         dc:creator     dc:subject
         dc:description   dc:publisher   dc:contributor
         dc:date          dc:type        dc:format
         dc:identifier    dc:source      dc:language
         dc:relation      dc:coverage    dc:rights

       This module declares the Dublin Core Metadata Element Set (DCMES)
       based on an XML version of DCMES 1.1 published by the Dublin Core
       Metadata Initiative. While the XML Namespace prefix for these
       elements can be redeclared, no provision for its optionality has
       been provided, as these elements are expected to be prefixed when
       used with XHTML (see "dcmes-qname-1.mod").

       Note that the XML Namespace declarations are slightly different
       than some XHTML-based document types in that while the 'xmlns:dc'
       and 'xmlns:rdf' attributes are allowed on each Dublin Core element,
       this DTD relies on the #FIXED values on <html> to supply validation
       of their values. This is to avoid default values being added by XML
       processors, which may substantially increase the size of a document.

       This module is included as part of:

         "Augmented Metadata for XHTML"
         http://www.neocortext.net/specs/meta/NOTE-augmeta.html
  -->

  <!-- a parameter entity class containing the DCMES 1.1 elements -->
  <!ENTITY % DC.class
     "%DC.title.qname;       | %DC.creator.qname;   | %DC.subject.qname;
    | %DC.description.qname; | %DC.publisher.qname; | %DC.contributor.qname;
    | %DC.date.qname;        | %DC.type.qname;      | %DC.format.qname;
    | %DC.identifier.qname;  | %DC.source.qname;    | %DC.language.qname;
    | %DC.relation.qname;    | %DC.coverage.qname;  | %DC.rights.qname;" >

  <!-- modify content model for inclusion of <meta> as an inline element -->
  <!ENTITY % Inline.extra '| %meta.qname;' >

  <!-- changes to <meta> as part of 'Augmented Metadata for XHTML' -->
  <!ENTITY % meta.content  "( #PCDATA | %br.qname; | %DC.class; )*" >
  <!ATTLIST %meta.qname;
        %id.attrib;
        content      CDATA                    #IMPLIED
        href         %URI.datatype;           #IMPLIED
  >

  <!-- add XML Namespace declarations to <html> to allow for defaulting -->
  <!ATTLIST %html.qname;
        %DC.xmlns.attrib;
  >

  <!-- This section declares the elements and attributes for
       the Dublin Core Metadata Element Set 1.1.
  -->

  <!ENTITY % lang.attrib
       "xml:lang     %LanguageCode.datatype;  #IMPLIED"
  >
  <!ENTITY % RDF.resource.attrib
       "xmlns:%RDF.prefix; %URI.datatype;     #IMPLIED
        %RDF.pfx;resource  %URI.datatype;     #IMPLIED"
  >
  <!ENTITY % DC.xmlns-optional.attrib
        "xmlns:%DC.prefix;  %URI.datatype;   #IMPLIED"
  >

  <!-- The name given to the resource. -->
  <!ELEMENT %DC.title.qname; ( #PCDATA ) >
  <!ATTLIST %DC.title.qname;
        %DC.xmlns-optional.attrib;
        %lang.attrib; >

  <!-- An entity primarily responsible for making the content
       of the resource. -->
  <!ELEMENT %DC.creator.qname; ( #PCDATA ) >
  <!ATTLIST %DC.creator.qname;
        %DC.xmlns-optional.attrib;
        %lang.attrib; >

  <!-- The topic of the content of the resource. -->
  <!ELEMENT %DC.subject.qname; ( #PCDATA ) >
  <!ATTLIST %DC.subject.qname;
        %DC.xmlns-optional.attrib;
        %lang.attrib; >

  <!-- An account of the content of the resource. -->
  <!ELEMENT %DC.description.qname; ( #PCDATA ) >
  <!ATTLIST %DC.description.qname;
        %DC.xmlns-optional.attrib;
        %lang.attrib; >

  <!-- The entity responsible for making the resource available. -->
  <!ELEMENT %DC.publisher.qname; ( #PCDATA ) >
  <!ATTLIST %DC.publisher.qname;
        %DC.xmlns-optional.attrib;
        %lang.attrib; >

  <!-- An entity responsible for making contributions to the
       content of the resource. -->
  <!ELEMENT %DC.contributor.qname; ( #PCDATA ) >
  <!ATTLIST %DC.contributor.qname;
        %DC.xmlns-optional.attrib;
        %lang.attrib; >

  <!-- A date associated with an event in the life cycle of the resource. -->
  <!ELEMENT %DC.date.qname; ( #PCDATA ) >
  <!ATTLIST %DC.date.qname;
        %DC.xmlns-optional.attrib;
        %lang.attrib; >

  <!-- The nature or genre of the content of the resource. -->
  <!ELEMENT %DC.type.qname; ( #PCDATA ) >
  <!ATTLIST %DC.type.qname;
        %DC.xmlns-optional.attrib;
        %lang.attrib; >

  <!-- The physical or digital manifestation of the resource. -->
  <!ELEMENT %DC.format.qname; ( #PCDATA ) >
  <!ATTLIST %DC.format.qname;
        %DC.xmlns-optional.attrib;
        %lang.attrib; >

  <!-- An unambiguous reference to the resource within a given context. -->
  <!ELEMENT %DC.identifier.qname; ( #PCDATA ) >
  <!ATTLIST %DC.identifier.qname;
        %DC.xmlns-optional.attrib;
        %RDF.resource.attrib;
        %lang.attrib;
  >

  <!-- A reference to a resource from which the present resource is derived. -->
  <!ELEMENT %DC.source.qname; ( #PCDATA ) >
  <!ATTLIST %DC.source.qname;
        %DC.xmlns-optional.attrib;
        %RDF.resource.attrib;
        %lang.attrib;
  >

  <!-- A language of the intellectual content of the resource. -->
  <!ELEMENT %DC.language.qname; ( #PCDATA ) >
  <!ATTLIST %DC.language.qname;
        %DC.xmlns-optional.attrib;
        %lang.attrib; >

  <!-- A reference to a related resource. -->
  <!ELEMENT %DC.relation.qname; ( #PCDATA ) >
  <!ATTLIST %DC.relation.qname;
        %DC.xmlns-optional.attrib;
        %RDF.resource.attrib;
        %lang.attrib;
  >

  <!-- The extent or scope of the content of the resource. -->
  <!ELEMENT %DC.coverage.qname; ( #PCDATA ) >
  <!ATTLIST %DC.coverage.qname;
        %DC.xmlns-optional.attrib;
        %lang.attrib; >

  <!-- Information about rights held in and over the resource. -->
  <!ELEMENT %DC.rights.qname; ( #PCDATA ) >
  <!ATTLIST %DC.rights.qname;
        %DC.xmlns-optional.attrib;
        %lang.attrib; >

  <!-- end of xhtml-dcmes-1.mod -->

B.3 Dublin Core 1.1 Qualified Names Module

This DTD module provides XML Namespace support for the Dublin Core elements in XHTML, available as dcmes-qname-1.mod.


  <!-- ...................................................................... -->
  <!-- Dublin Core 1.1 Qualified Names Module ............................... -->
  <!-- file: dcmes-qname-1.mod

       This is an extension of XHTML, a reformulation of HTML as
       a modular XML application.
       Copyright 2002 Murray Altheim. All Rights Reserved.
       Revision: $Id: dcmes-qname-1.mod,v 4.1 2001/06/05 09:22:01 altheim Exp $

       This DTD module is identified by the PUBLIC and SYSTEM identifiers:

         PUBLIC "-//neocortext.net//ENTITIES XHTML DCMES 1.1 Qualified Names 1.0//EN" 
         SYSTEM "http://www.neocortext.net/specs/meta/dcmes-qname-1.mod" (temporary)
  -->

  <!-- Dublin Core Metadata Element Set 1.1 (DCMES) Qualified Names Module

       This module is contained in two parts, labeled Section 'A' and 'B':

         Section A declares parameter entities to support namespace-
         qualified names, namespace declarations, and name prefixing
         for DCMES 1.1.

         Section B declares parameter entities used to provide
         namespace-qualified names for all DCMES 1.1 element types:
  -->

  <!-- Section A: Dublin Core XML Namespace Framework :::::::::::::: -->

  <!-- XML Namespaces for DCMES 1.1 and RDF -->
  <!ENTITY % DC.xmlns  "http://purl.org/dc/elements/1.1/" >
  <!ENTITY % RDF.xmlns "http://www.w3.org/1999/02/22-rdf-syntax-ns#" >

  <!-- NOTE: As specified in [XMLNAMES], the namespace prefix serves as
       a proxy for the URI reference, and is not in itself significant.
       The following may be redeclared in a document's internal subset.
  -->
  <!ENTITY % DC.prefix  "dc" >
  <!ENTITY % RDF.prefix "rdf" >

  <!ENTITY % DC.pfx  "%DC.prefix;:" >
  <!ENTITY % RDF.pfx "%RDF.prefix;:" >

  <!ENTITY % DC.xmlns.attrib
        "xmlns:%DC.prefix;  %URI.datatype;   #FIXED '%DC.xmlns;'
         xmlns:%RDF.prefix; %URI.datatype;   #FIXED '%RDF.xmlns;'"
  >
  <!ENTITY % XHTML.xmlns.extra.attrib
        "%DC.xmlns.attrib;"
  >

  <!-- Section B: Dublin Core Qualified Names :::::::::::::::::::::: -->

  <!-- This section declares parameter entities used to provide
       namespace-qualified names for all Dublin Core element types.
  -->

  <!-- module:  xhtml-dcmes-1.mod -->
  <!ENTITY % DC.title.qname       "%DC.pfx;title" >
  <!ENTITY % DC.creator.qname     "%DC.pfx;creator" >
  <!ENTITY % DC.subject.qname     "%DC.pfx;subject" >
  <!ENTITY % DC.description.qname "%DC.pfx;description" >
  <!ENTITY % DC.publisher.qname   "%DC.pfx;publisher" >
  <!ENTITY % DC.contributor.qname "%DC.pfx;contributor" >
  <!ENTITY % DC.date.qname        "%DC.pfx;date" >
  <!ENTITY % DC.type.qname        "%DC.pfx;type" >
  <!ENTITY % DC.format.qname      "%DC.pfx;format" >
  <!ENTITY % DC.identifier.qname  "%DC.pfx;identifier" >
  <!ENTITY % DC.source.qname      "%DC.pfx;source" >
  <!ENTITY % DC.language.qname    "%DC.pfx;language" >
  <!ENTITY % DC.relation.qname    "%DC.pfx;relation" >
  <!ENTITY % DC.coverage.qname    "%DC.pfx;coverage" >
  <!ENTITY % DC.rights.qname      "%DC.pfx;rights" >

  <!-- end of dcmes-qname-1.mod -->

C. The Dublin Core Metadata Element Set (Informative)

C.1 The Dublin Core Metadata Element Set

The Dublin Core Metadata Element Set (aka DCMES) is a relatively simple schema consisting of fifteen elements. These elements include the kind of content one typically finds on the copyright page of a book: title, author, subject, publication data, etc. The DCMES is designed to contain such data, though this certainly shouldn't be considered its only application. If one considers that the Dublin Core provides for the association of title, author, date (e.g., creation date, issue date, revision date), publisher (e.g., the webmaster or their employer), language, format (e.g., "text/html", "img/gif", "video/mpeg"), and perhaps most importantly for those interested in a “Semantic Web”, subject, then without extension the Dublin Core may provide for a suitable “80/20” point for introducing a well-designed and proven metadata standard onto the Web.

The definitions provided below include both the conceptual and representational form of each Dublin Core Element Type and their Identifier (the latter is used as the value of the name attribute when embedded within an XHTML <meta> element). This is derived from the DCMES 1.1 Recommendation [DCMES], the DCMI Recommendation for encoding Dublin Core in XML ) [DCMES-XML], and the DCMI Recommendation for encoding qualified Dublin Core metadata in HTML (and XHTML) [DCQ-HTML]. The Definition attribute captures the semantic concept and the Datatype and Comment attributes capture the data representation.

Each Dublin Core definition refers to the resource being described. A resource is defined in [RFC2396] as "anything that has identity". For the purposes of Dublin Core metadata, a resource will typically be an information or service resource, but may be applied more broadly. In the case of this specification, how the metadata refers to a specific resource is as described in Section 5.2.

Title
Identifier: "DC.Title"
Element Type: <dc:title>
Definition: A name given to the resource.
Comment: Typically, a Title will be a name by which the resource is formally known.
Creator
Identifier: "DC.Creator"
Element Type: <dc:creator>
Definition: An entity primarily responsible for making the content of the resource.
Comment: Examples of a Creator include a person, an organisation, or a service. Typically, the name of a Creator should be used to indicate the entity.
Subject and Keywords
Identifier: "DC.Subject"
Element Type: <dc:subject>
Definition: The topic of the content of the resource.
Comment: Typically, a Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.
Description
Identifier: "DC.Description"
Element Type: <dc:description>
Definition: An account of the content of the resource.
Comment: Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.
Publisher
Identifier: "DC.Publisher"
Element Type: <dc:publisher>
Definition: An entity responsible for making the resource available
Comment: Examples of a Publisher include a person, an organisation, or a service. Typically, the name of a Publisher should be used to indicate the entity.
Contributor
Identifier: "DC.Contributor"
Element Type: <dc:contributor>
Definition: An entity responsible for making contributions to the content of the resource.
Comment: Examples of a Contributor include a person, an organisation, or a service. Typically, the name of a Contributor should be used to indicate the entity.
Date
Identifier: "DC.Date"
Element Type: <dc:date>
Definition: A date associated with an event in the life cycle of the resource.
Comment: Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and follows the YYYY-MM-DD format.
Resource Type
Identifier: "DC.Type"
Element Type: <dc:type>
Definition: The nature or genre of the content of the resource.
Comment: Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the working draft list of Dublin Core Types [DCT1]). To describe the physical or digital manifestation of the resource, use the <dc:format> element.
Format
Identifier: "DC.Format"
Element Type: <dc:format>
Definition: The physical or digital manifestation of the resource.
Comment: Typically, Format may include the media-type or dimensions of the resource. Format may be used to determine the software, hardware or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types [MIME] defining computer media formats).
Resource Identifier
Identifier: "DC.Identifier"
Element Type: <dc:identifier>
Definition: An unambiguous reference to the resource within a given context.
Comment: Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Example formal identification systems include the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN).
Source
Identifier: "DC.Source"
Element Type: <dc:source>
Definition: A Reference to a resource from which the present resource is derived.
Comment: The present resource may be derived from the Source resource in whole or in part. Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system.
Language
Identifier: "DC.Language"
Element Type: <dc:language>
Definition: A language of the intellectual content of the resource.
Comment: Recommended best practice for the values of the Language element is defined by RFC 1766 [RFC1766] which includes a two-letter Language Code (taken from the ISO 639 standard [ISO639]), followed optionally, by a two-letter Country Code (taken from the ISO 3166 standard [ISO3166]). For example, 'en' for English, 'fr' for French, or 'en-uk' for English used in the United Kingdom.
Relation
Identifier: "DC.Relation"
Element Type: <dc:relation>
Definition: A reference to a related resource.
Comment: Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system.
Coverage
Identifier: "DC.Coverage"
Element Type: <dc:coverage>
Definition: The extent or scope of the content of the resource.
Comment: Coverage will typically include spatial location (a place name or geographic coordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). Recommended best practice is to select a value from a controlled vocabulary (for example, the Thesaurus of Geographic Names [TGN]) and that, where appropriate, named places or time periods be used in preference to numeric identifiers such as sets of coordinates or date ranges.
Rights Management
Identifier: "DC.Rights"
Element Type: <dc:rights>
Definition: Information about rights held in and over the resource.
Comment: Typically, a Rights element will contain a rights management statement for the resource, or reference a service providing such information. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. If the Rights element is absent, no assumptions can be made about the status of these and other rights with respect to the resource.

C.2 Dublin Core Qualifiers

Dublin Core content may be further qualified by refinement or encoding scheme. The former makes the meaning of the element more specific, the latter identifies a scheme to assist in interpreting the metadata content. DCMI tasked element-specific working groups with identifying qualifiers for each DC element, as described in Dublin Core Qualifiers [DCMES-Qualifiers]. The refinements typically maintain the meaning but narrow the scope (e.g., "Created" on <dc:date>), encoding schemes (e.g., "Dewey Decimal Classification" on <dc:subject>) typically indicate that the element content comes from a controlled vocabulary (i.e., a schema). Content without an explicit qualifier is considered "unqualified," but according to a Dumb-Down Principle, a client should be able to ignore a qualifier and still be able to use the description.

DCMI does not consider their qualifier list closed; they expect that both they and implementors will develop additional qualifiers for specific domains, noting that while such locally-used qualifiers may not be as interoperable as widely-understood ones, the description is still likely to be usable in cross-domain resource discovery.

Following are some examples of how to select a subject classification for your metadata using three of the common DC subject qualifiers, Library of Congress Subject Headings, Library of Congress Classification (LCC), and Dewey Decimal Classification (DDC). This example is meant to demonstrate that subject classification is not always a simple process, though it is hoped that more widespread use by non-librarians might spur development of some improved (and free) online classification services for lay people.

Example: U.S. Library of Congress Subject Headings (LCSH)

You can search the U.S. Library of Congress Catalog at Gateway to Library Catalogs (choose either Simple Search, Advanced Search, Left-Anchored Phrase Search, or try a different z39.50-based catalog from around the world).

I searched on the phrase "Harvester Ant" (using the Advanced or Simple Search), and located quite a few library records. I honed my search by checking “More on this record” for subject details, and located several records that seemed to match my subject. I was able to determine that the subject of "Harvester Ant" could either be indicated by the LC Call No. QL568 or Dewey 595.

Example: U.S. Library of Congress Classification (LCC)

You can browse the U.S. Library of Congress Classification Outline where you'll confront a list of 21 main classes of the LCC. Unfortunately, the next level of browsing forces you to download a PDF file. Since "Harvester Ants" seems to be in the realm of Science, I downloaded the "Q -- Science" file. By reading through the classifications, it was easy to locate the range of my subject:

  Class Q      Science
  Subclass QL  Zoology
  QL461-599.82 Insects

Unfortunately, the finest grain available here indicates a range, not a specific value. The previous method (using LCSH) was able to determine that the LC Call Number is actually QL568, so this method at least confirmed the correct value.

Example: Dewey Decimal Classification (DDC)

You can browse the About Dewey Web page from Online Computer Library Center OCLC, navigating to the latest DDC Summaries page. By browsing the "First Summary", "Second Summary" and "Third Summary" (each with finer resolution), I was able to locate the subject of "Harvester Ants" as:

  500  Science
  590  Zoology
  595  Arthropoda

There are a number of online DDC web pages that list the complete classification system. Note that DDC is versioned scheme, and that OCLC updates the finer resolution numbers periodically.

C.3 Dublin Core Type Vocabulary

The DCMI Type Vocabulary provides a general, cross-domain list of approved terms that may be used as content of the <dc:type> element, or values of the DC.Type property (when expressed as <meta> attribute content) to identify the genre of a resource.

The following are links to definitions from "DCMI Type Vocabulary", [DCMI-Types]:

D. Acknowledgements (Informative)

The editors would like to thank those who have provided valuable feedback on this document, including [...]

Sincere thanks to the Arizona-Sonora Desert Museum for permission to use content from both their excellent web site and one of their print publications (which is invaluable in identifying holes in the desert):

Merlin, Pinau, A Field Guide to Desert Holes. Tucson, Arizona: Arizona-Sonora Desert Museum Press, 1999.
[ISBN 1-886679-12-6]

E. Production Notes (Informative)

This document includes metadata as described herein, and is a valid instance of the XHTML Augmented Metadata 1.0 DTD.

F. Browser Test Area (Temporary)

Following are some samples of embedded metadata, each using a different method of hiding or displaying its content. In the first tests, success is the absence of a display. Following these are several tests that display the metadata content. For syntax details, check the XHTML source of this specification.

Hiding Test 1:
Following this paragraph is a table element which contains a <div> element containing a DC metadata block. The <div> element's style property (associated by class attribute in this document's internal style sheet) "display" has been set to "none":

:
Augmented Metadata in XHTML Murray Altheim Neocortext.Net 2001-06-04

Hiding Test 2:
Following this paragraph is a DC metadata block inside of a single celled table. Following a colon character, the <meta> element's style property (associated in the document's internal style sheet by element type) "display" has been set to "none":

: Augmented Metadata in XHTML Murray Altheim Neocortext.Net 2001-06-04

Hiding Test 3:
Following this paragraph is a single cell table containing a <div> element containing a DC metadata block. Following a colon character, the <meta> element has a style attribute assigning a style property "display:none":

:
Augmented Metadata in XHTML Murray Altheim Neocortext.Net 2001-06-04

Hiding Test 4:
Following this paragraph is a single cell table. Following an initial colon character there are four <meta> elements. Since these contain only attribute content, only the colon should be displayed:

:

Display Test 5:
Following this paragraph is a table whose single cell contains a DC metadata block. This test does not attempt to hide the metadata content, instead including <br /> elements as line breaks, and styling for the entire table cell via the document stylesheet.

: Augmented Metadata in XHTML
Murray Altheim
Neocortext.Net
2001-06-04

Display Test 6:
Following this paragraph is a single cell table containing a <div> element containing a DC metadata block. This test does not attempt to hide the metadata content, instead styling the entire block as "whitespace : pre" via the document stylesheet.

:
Augmented Metadata in XHTML Murray Altheim Neocortext.Net 2001-06-04

Display Test 7:
This test is basically the same as Test 5 except it attempts to attach styling via an id attribute on the <meta> element itself. If this works, the font color should appear as red.

: Augmented Metadata in XHTML
Murray Altheim
Neocortext.Net
2001-06-04

It is noted that current versions of several browsers hide the metadata content on tests 1 and 4, the former only when CSS is on. The safest encoding method is to use attribute content on <meta> elements.