org.ceryle.plink
Class MetaProcessor

java.lang.Object
  extended by org.ceryle.plink.MetaProcessor

public class MetaProcessor
extends Object

Harvests XHTML <meta>-based metadata from a supplied XHTML document. For more extensive documentation see the plink home page, or the specification upon which this is based, Augmented Metadata in XHTML
See: http://purl.org/ceryle/docs/NOTE-xhtml-augmeta.html.

Known bugs:
The processor does not properly handle qualified DCMES attributes.

Since:
JDK1.2
Version:
$Id: MetaProcessor.java,v 3.6 2007-06-15 12:09:28 altheim Exp $
Author:
Murray Altheim
See Also:
Constants

Field Summary
protected  String documentSystemId
          A String containing the system identifier (filename) of the Document.
protected  String documentTitle
          A String containing the contents of the <title> Element of the Document.
protected  MetaHandler metaH
          A MetaHandler to receive events from this MetaProcessor.
 
Constructor Summary
MetaProcessor(MetaHandler metahandler)
          Default constructor.
 
Method Summary
protected  String convertAttrToElt(String name)
          Returns a String as the conversion of the supplied attribute name as an element type name.
protected  String generateXPointerExpression(Element element)
          Returns a String containing an XPointer reference to the specified node within the document tree.
 String getAbout(Document doc, Element meta)
          Processes the Element node of Document doc to return a URI string suitable as a reference to the object.
protected  int getDepth(Document doc, Element element)
          Returns an int indicating the depth within the document element (or 'root') that Element element resides.
 Document getHarvestDocument()
          Returns the Document populated with the harvested metadata.
protected  String getPlinkReference(Document doc, Element element)
          Returns a String containing a reference to the specified Element element of Document doc by pointing to the plink SID or SSN (preferring the former) for the node, null if unavailable.
 void harvest(Document doc, String systemId)
          Processes the Document doc to return a harvested Document.
 void processMeta(Document doc, Hashtable crop, Element meta)
          Processes the Element meta to populate the supplied Vector with metadata content.
protected  void registerSchemas(Document doc)
          Extracts all link elements from the provided Document doc and registers a schema for those having a rel attribute whose content attribute follows the schema registration formula: “schema.PREFIX”, where PREFIX is the schema prefix String.
protected  String traverse(Node node, String schema)
          Recursively traverses the specified node.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

metaH

protected MetaHandler metaH
A MetaHandler to receive events from this MetaProcessor.


documentTitle

protected String documentTitle
A String containing the contents of the <title> Element of the Document.


documentSystemId

protected String documentSystemId
A String containing the system identifier (filename) of the Document.

Constructor Detail

MetaProcessor

public MetaProcessor(MetaHandler metahandler)
Default constructor.

Method Detail

registerSchemas

protected void registerSchemas(Document doc)
Extracts all link elements from the provided Document doc and registers a schema for those having a rel attribute whose content attribute follows the schema registration formula: “schema.PREFIX”, where PREFIX is the schema prefix String. This method should be run prior to any other document processing, as the availability of attribute namespace schemas is necessary for validation of attribute namespace prefixes.


harvest

public void harvest(Document doc,
                    String systemId)
Processes the Document doc to return a harvested Document. This is generally used only for XHTML source documents. The String systemId is the locator used to create references to the document as a resource.


processMeta

public void processMeta(Document doc,
                        Hashtable crop,
                        Element meta)
                 throws Exception
Processes the Element meta to populate the supplied Vector with metadata content.

Throws:
Exception

getAbout

public String getAbout(Document doc,
                       Element meta)
                throws Exception
Processes the Element node of Document doc to return a URI string suitable as a reference to the object. The priority for creating a locator is as follows:
  1. the content of an href attribute if available.
  2. if within the document <head>, it is document metadata
  3. use ID of parent if available
  4. check for plink SID
  5. otherwise create XPath expression (least favourable)

Throws:
Exception

getPlinkReference

protected String getPlinkReference(Document doc,
                                   Element element)
Returns a String containing a reference to the specified Element element of Document doc by pointing to the plink SID or SSN (preferring the former) for the node, null if unavailable.


generateXPointerExpression

protected String generateXPointerExpression(Element element)
Returns a String containing an XPointer reference to the specified node within the document tree. This may be optimized by including an ID attribute on any ancestor element, which will be used via a relative reference when available. Creating unambiguous references is likely time consuming for large documents, processor-wise.


traverse

protected String traverse(Node node,
                          String schema)
                   throws Exception
Recursively traverses the specified node. The String schema is an optional qualifier for the metadata content.

Throws:
Exception

getHarvestDocument

public Document getHarvestDocument()
Returns the Document populated with the harvested metadata. If never set, null is returned.


convertAttrToElt

protected String convertAttrToElt(String name)
                           throws Exception
Returns a String as the conversion of the supplied attribute name as an element type name. If the supplied String does not include a period delimiter or there is no link element available to indicate its namespace, it is returned intact. If it is a DCMES name it will be suitably converted; if the substring following the prefix is not a valid DCMES value, an exception is thrown. NOTE: this uses a case-insensitive match to allow for some author confusion as to element type vs. attribute names in the various DCMI specifications (the attribute-based names are capitalized, whereas the DCMES elements are lowercased).

Throws:
Exception

getDepth

protected int getDepth(Document doc,
                       Element element)
                throws Exception
Returns an int indicating the depth within the document element (or 'root') that Element element resides.

Throws:
Exception


The Ceryle Project. Copyright ©2001-2007 Murray Altheim, All Rights Reserved. See LICENSE included with distribution.