org.ceryle.xml
Class XMLUtils

java.lang.Object
  extended by org.ceryle.xml.XMLUtils

public class XMLUtils
extends Object

Provides some static XML utility methods.

Since:
JDK1.3
Version:
$Id: XMLUtils.java,v 3.22 2007-06-20 01:28:40 altheim Exp $
Author:
Murray Altheim

Field Summary
static boolean entifyApos
          When true, entify apostrophe ("'") characters.
static boolean forceUTF8
          Force set of XML declaration's encoding to UTF-8 when true.
static int SERIALIZE_HTML
          An enumerated type indicating that the serialization method used is HTML, which treats the incoming node as a text string (ie., no indenting or other processing).
static int SERIALIZE_TEXT
          An enumerated type indicating that the serialization method used is TEXT, which treats the incoming node as a text string (ie., no indenting).
static int SERIALIZE_UNKNOWN
          An enumerated type indicating an unknown serialization method (-1).
static int SERIALIZE_XHTML
          An enumerated type indicating that the serialization method used is XHTML, which preserves XHTML's empty element behaviour.
static int SERIALIZE_XML
          An enumerated type indicating that the serialization method used is XML (default).
static int SERIALIZE_XTM
          An enumerated type indicating that the serialization method used is for XTM (same as XML).
static String XSLT_property_cdata_section_elements
          The standard XSLT property keys supported are:
static String XSLT_property_doctype_public
          The standard XSLT property keys supported are:
static String XSLT_property_doctype_system
          The standard XSLT property keys supported are:
static String XSLT_property_encoding
          The standard XSLT property keys supported are:
static String XSLT_property_indent
          The standard XSLT property keys supported are:
static String XSLT_property_media_type
          The standard XSLT property keys supported are:
static String XSLT_property_method
          The standard XSLT property keys supported are:
static String XSLT_property_omit_xml_declaration
          The standard XSLT property keys supported are:
static String XSLT_property_standalone
          The standard XSLT property keys supported are:
static String XSLT_property_version
          The standard XSLT property keys supported are:
 
Constructor Summary
XMLUtils()
           
 
Method Summary
static String deentify(String s)
          Return a string with any of the XML-defined numeric or named character entities replaced by their character equivalents (as normal text).
static String entify(String s, boolean asNumeric)
          Return a string with markup-sensitive characters (LT,GT,AMP,APOS and QUOT) expressed as either numeric or named character entities, depending on the boolean asNumeric.
static Document generateDocument(String uri, String qname)
          Returns a DOM implementation of a Document object with a document element having an XML Namespace URI uri and a qualified name qname.
static Element getChildElementByTagName(Element element, String name, boolean exclusive)
          Returns the child Element of the Element provided whose element type ("tag") name matches the String name, throwing a ProcessException if there are more than one such child element when the boolean exclusive is true, returning the first instance otherwise.
static List getElementsWithAttribute(Document doc, String name, String attrname, String value, boolean ignoreCase)
          Returns the List of all instances found of an element with the provided tag name containing an attribute attr whose value is value, in the order they are located.
static String getElementText(Element element, boolean normalizeWS)
          Returns the text content of all Text node children of Element element, ignoring any descendant elements (grandchildren, etc.).
static Element getElementWithAttribute(Document doc, String name, String attrname, String value, boolean ignoreCase)
          Returns the first instance found of an element with the provided tag name containing an attribute attr whose value is value.
static Element getFirstChildElement(Element element, String name)
          Returns the first child element of element whose element type name (AKA "tag name") matches the provided String, null if no match.
static Element getFirstChildElementNS(Element element, String namespaceURI, String localName)
          Returns the first child element of element whose XML Namespace URI and local name (AKA "element type name" or "tag name") matches the parameters, null if no match.
static Element getFirstDescendantElement(Element element, String tagName)
          Returns the first descendant element of element whose element type name (AKA "tag name") matches the provided String, null if no match.
static int getMethodForMIME(MIME mime)
          Returns a XMLUtils.SERIALIZE_* value based upon the provided MIME type.
static String getMethodName(int method)
          Returns a String indicator of the provided serialization method, as provided by the org.apache.xml.serialize.Method class.
static MIME getMIMEtype(int method)
          Returns a MIME object based upon the provided serialization method (XMLUtils.SERIALIZE_XML, etc.).
static String getPCDATAContent(Element element)
          Returns the PCDATA content of the Element provided.
static String getSerializationMethod(String mimetype)
          Returns the serialization method (as a String, but using the org.apache.xml.serialize.Method class as the source) based on a String comparison with mimetype when matched against several common MIME types.
static String getSerializedNode(Node node)
          A static method that returns a serialization of the provided DOM Document or Element using an XML serialization method.
static int getSerializedSize(Node node)
          A static method that returns the size in characters of the serialization of the provided DOM Document or Element using an XML serialization method.
static org.apache.xml.serializer.Serializer getSerializer(Object out, String method)
          Returns a serializer suitable for the provided method, using the provided Writer or OutputStream out.
static org.apache.xml.serializer.Serializer getSerializer(Properties props, Object out, String method)
          Returns a serializer suitable for the provided method, using the provided Writer or OutputStream out.
static String harvestText(Node node, boolean useTreeWalker, boolean goDeep, boolean normalizeWS)
          Provided with a DOM Node node, iterates over its content, returning a concatenation of all Text nodes, with normalized whitespace as necessary to keep words from erroneously merging if normalizeWS is true.
static boolean isXML(int method)
          Returns true if the provided serialization method is a non-XHTML form of XML, including XTM and generic XML.
static void removeNamespaceCruft(Node node)
          Provided with a DOM Node node, iterates over its content, removing all namespace cruft, including all namespace declarations and prefixes.
static String scanForTitle(Element doc, String content)
          Provided with a DOM Node node (expected to be an XHTML Document), iterates over its content, returning the first matching "DC.Title" content.
static String scanTextForTitle(String s, String name)
          A utility method that attempts to obtain the element content from the first instance in an HTML document provided as a String s of a given element type name.
static boolean serialize(Document doc, String filename)
          A convenience method that writes the Document doc to a file named filename, using an XML serialization method.
static boolean serialize(Node node, Object out, String method)
          Using the serializer from the Xalan project, serializes the provided DOM Node to the provided Writer using the designated method.
static void setPCDATAContent(Element element, String content)
          Sets the PCDATA content of the Element provided.
static String stripMarkup(String s)
          Strips markup from the provided String s.
static Document toDocument(String source)
          A static method, when provided with a String source, parses it to an XML Document.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SERIALIZE_UNKNOWN

public static final int SERIALIZE_UNKNOWN
An enumerated type indicating an unknown serialization method (-1).

See Also:
Constant Field Values

SERIALIZE_XML

public static final int SERIALIZE_XML
An enumerated type indicating that the serialization method used is XML (default).

See Also:
Constant Field Values

SERIALIZE_XTM

public static final int SERIALIZE_XTM
An enumerated type indicating that the serialization method used is for XTM (same as XML).

See Also:
Constant Field Values

SERIALIZE_XHTML

public static final int SERIALIZE_XHTML
An enumerated type indicating that the serialization method used is XHTML, which preserves XHTML's empty element behaviour.

See Also:
Constant Field Values

SERIALIZE_HTML

public static final int SERIALIZE_HTML
An enumerated type indicating that the serialization method used is HTML, which treats the incoming node as a text string (ie., no indenting or other processing).

See Also:
Constant Field Values

SERIALIZE_TEXT

public static final int SERIALIZE_TEXT
An enumerated type indicating that the serialization method used is TEXT, which treats the incoming node as a text string (ie., no indenting).

See Also:
Constant Field Values

forceUTF8

public static boolean forceUTF8
Force set of XML declaration's encoding to UTF-8 when true. Default is true.


entifyApos

public static boolean entifyApos
When true, entify apostrophe ("'") characters. As this is generally more of an annoyance when double quotes are used as attribute value delimiters (as they always are in this application), the default is false.


XSLT_property_method

public static final String XSLT_property_method
The standard XSLT property keys supported are:

See Also:
Constant Field Values

XSLT_property_version

public static final String XSLT_property_version
The standard XSLT property keys supported are:

See Also:
Constant Field Values

XSLT_property_encoding

public static final String XSLT_property_encoding
The standard XSLT property keys supported are:

See Also:
Constant Field Values

XSLT_property_standalone

public static final String XSLT_property_standalone
The standard XSLT property keys supported are:

See Also:
Constant Field Values

XSLT_property_doctype_public

public static final String XSLT_property_doctype_public
The standard XSLT property keys supported are:

See Also:
Constant Field Values

XSLT_property_doctype_system

public static final String XSLT_property_doctype_system
The standard XSLT property keys supported are:

See Also:
Constant Field Values

XSLT_property_indent

public static final String XSLT_property_indent
The standard XSLT property keys supported are:

See Also:
Constant Field Values

XSLT_property_media_type

public static final String XSLT_property_media_type
The standard XSLT property keys supported are:

See Also:
Constant Field Values

XSLT_property_cdata_section_elements

public static final String XSLT_property_cdata_section_elements
The standard XSLT property keys supported are:

See Also:
Constant Field Values

XSLT_property_omit_xml_declaration

public static final String XSLT_property_omit_xml_declaration
The standard XSLT property keys supported are:

See Also:
Constant Field Values
Constructor Detail

XMLUtils

public XMLUtils()
Method Detail

generateDocument

public static Document generateDocument(String uri,
                                        String qname)
Returns a DOM implementation of a Document object with a document element having an XML Namespace URI uri and a qualified name qname.


toDocument

public static Document toDocument(String source)
                           throws DocumentException
A static method, when provided with a String source, parses it to an XML Document. If there is already an instantiated XMLProcessor, use the process() method to avoid recreating an org.xml.sax.DocumentBuilderFactory and org.xml.sax.DocumentBuilder for each document (an expensive process).

Throws:
DocumentException

getSerializer

public static org.apache.xml.serializer.Serializer getSerializer(Object out,
                                                                 String method)
Returns a serializer suitable for the provided method, using the provided Writer or OutputStream out.

Note: because most of the planned use of this method is to support a Writer rather than OutputStream parameter, rather than throwing an UnsupportedEncodingException on building the serializer, an error is registered with the handler and null is returned.


getSerializer

public static org.apache.xml.serializer.Serializer getSerializer(Properties props,
                                                                 Object out,
                                                                 String method)
Returns a serializer suitable for the provided method, using the provided Writer or OutputStream out. Either the Properties props or method must be provided, the former takes precedence. This will need to be converted to a DOMSerializer or ContentHandler after being returned by this method.

Note: because most of the planned use of this method is to support a Writer rather than OutputStream parameter, rather than throwing an UnsupportedEncodingException on building the serializer, an error is registered with the handler and null is returned. If the OutputFormat is non-null, the serialization method is supplied by the OutputFormat; the method parameter is then ignored.


getSerializedSize

public static int getSerializedSize(Node node)
                             throws ProcessException
A static method that returns the size in characters of the serialization of the provided DOM Document or Element using an XML serialization method. This is expensive, as it does require performing the serialization to provide the value.

Throws:
ProcessException - if an error occurs during serialization.

getSerializedNode

public static String getSerializedNode(Node node)
                                throws ProcessException
A static method that returns a serialization of the provided DOM Document or Element using an XML serialization method.

Throws:
ProcessException - if an error occurs during serialization.

serialize

public static boolean serialize(Document doc,
                                String filename)
                         throws ProcessException
A convenience method that writes the Document doc to a file named filename, using an XML serialization method. This method never returns false, but holds its sole 'true' return value for legacy methods that rely upon it.

Parameters:
doc - the DOM Document to serialize
filename - the pathname of the target file
Returns:
true if serialized without error
Throws:
ProcessException - if an error occurs during serialization.

serialize

public static boolean serialize(Node node,
                                Object out,
                                String method)
                         throws IOException
Using the serializer from the Xalan project, serializes the provided DOM Node to the provided Writer using the designated method. If the provided Node is null, no serialization occurs, but no exception is thrown; this is the means of returning the Serializer. If the method is XML or XHTML and the DOCTYPE is available, its public and system IDs are used to provide a DOCTYPE declaration for the output serialization.

Parameters:
node - the DOM node to be serialized.
out - the File, Writer or OutputStream to receive the serialized content.
method - the method should be one of Method.TEXT, Method.HTML, Method.XHTML or Method.XML (the default if the value is not recognized).
Returns:
true if successful.
Throws:
IOException - if out is not a Writer or an OutputStream, or a serialization error occurs.

getMIMEtype

public static MIME getMIMEtype(int method)
Returns a MIME object based upon the provided serialization method (XMLUtils.SERIALIZE_XML, etc.). If unrecognized, this will return null.


getMethodName

public static String getMethodName(int method)
Returns a String indicator of the provided serialization method, as provided by the org.apache.xml.serialize.Method class. Methods include "SERIALIZE_XML", "SERIALIZE_XHTML", etc. Note that XTM returns "xml". If the value is unrecognized, this will return "xml".


getMethodForMIME

public static int getMethodForMIME(MIME mime)
Returns a XMLUtils.SERIALIZE_* value based upon the provided MIME type. This returns the plain text method for MIME types of "text/plain", "text/wiki", "text/x-ltm" and "text/html". If unrecognized, this will return -1.


getSerializationMethod

public static String getSerializationMethod(String mimetype)
Returns the serialization method (as a String, but using the org.apache.xml.serialize.Method class as the source) based on a String comparison with mimetype when matched against several common MIME types. These include:


isXML

public static boolean isXML(int method)
Returns true if the provided serialization method is a non-XHTML form of XML, including XTM and generic XML. (XHTML is excluded because this method is used as a filter for non-XHTML XML.)


entify

public static final String entify(String s,
                                  boolean asNumeric)
Return a string with markup-sensitive characters (LT,GT,AMP,APOS and QUOT) expressed as either numeric or named character entities, depending on the boolean asNumeric. Since either are intrinsically part of XML, no additional DTD declarations are necessary.

Note that use of numeric entities allows for more "entification" than simply XML's five built-in entities. While not currently supported, use of numeric entities may in the future mean support for characters not included in the current encoding.

If the provided String is either null or empty, an empty String is returned.


deentify

public static final String deentify(String s)
Return a string with any of the XML-defined numeric or named character entities replaced by their character equivalents (as normal text). This includes "'"/"'", """/""", "<"/"<", ">"/">", and "&"/"&".

If the provided String is either null or empty, an empty String is returned.


getElementWithAttribute

public static Element getElementWithAttribute(Document doc,
                                              String name,
                                              String attrname,
                                              String value,
                                              boolean ignoreCase)
Returns the first instance found of an element with the provided tag name containing an attribute attr whose value is value. Returns null if no match is found. If ignoreCase is true, the value match is not case sensitive.


getElementsWithAttribute

public static List getElementsWithAttribute(Document doc,
                                            String name,
                                            String attrname,
                                            String value,
                                            boolean ignoreCase)
Returns the List of all instances found of an element with the provided tag name containing an attribute attr whose value is value, in the order they are located. Returns null if no match is found. If ignoreCase is true, the value match is not case sensitive.


removeNamespaceCruft

public static void removeNamespaceCruft(Node node)
Provided with a DOM Node node, iterates over its content, removing all namespace cruft, including all namespace declarations and prefixes.


getFirstDescendantElement

public static Element getFirstDescendantElement(Element element,
                                                String tagName)
Returns the first descendant element of element whose element type name (AKA "tag name") matches the provided String, null if no match.


getFirstChildElement

public static Element getFirstChildElement(Element element,
                                           String name)
Returns the first child element of element whose element type name (AKA "tag name") matches the provided String, null if no match.


getFirstChildElementNS

public static Element getFirstChildElementNS(Element element,
                                             String namespaceURI,
                                             String localName)
Returns the first child element of element whose XML Namespace URI and local name (AKA "element type name" or "tag name") matches the parameters, null if no match.


getElementText

public static String getElementText(Element element,
                                    boolean normalizeWS)
Returns the text content of all Text node children of Element element, ignoring any descendant elements (grandchildren, etc.). This involves using the DOM normalize() method to consolidate sibling Text nodes prior to processing. Any internal markup will be ignored, and if normalizeWS is true, "proper" whitespace between Text nodes is maintained.

Note that currently this returns an empty String even for Elements that have no children. This behaviour should not be relied upon, and may be changed in the future (i.e., empty Elements may return null).


harvestText

public static String harvestText(Node node,
                                 boolean useTreeWalker,
                                 boolean goDeep,
                                 boolean normalizeWS)
Provided with a DOM Node node, iterates over its content, returning a concatenation of all Text nodes, with normalized whitespace as necessary to keep words from erroneously merging if normalizeWS is true. This could be used to provide a text view of an XHTML document, for example.

If the boolean useTreeWalker is true, uses an org.w3c.dom.traversal.TreeWalker, otherwise an org.w3c.dom.traversal.NodeIterator. If the boolean goDeep is true, traverses nodes "forward" in a depth-first traversal. If false, only the children of the provided node. Since knowledge of placement in the tree is available only with the TreeWalker, goDeep is only relevant when using the TreeWalker; NodeIterators always go deep.


scanForTitle

public static String scanForTitle(Element doc,
                                  String content)
Provided with a DOM Node node (expected to be an XHTML Document), iterates over its content, returning the first matching "DC.Title" content. If there is no match to the form template, the contents of the XHTML <title> are returned. If the String Content is non-null, it is assumed to be some bastardized form of HTML, which will hopefully contain a text string beginning with "<title". This is used to make a last-gasp attempt at a title. If doc is null, the content is passed to scanTextForTitle(String,String), which then attempts to obtain the title.


scanTextForTitle

public static String scanTextForTitle(String s,
                                      String name)
A utility method that attempts to obtain the element content from the first instance in an HTML document provided as a String s of a given element type name. The resulting value will be trimmed of leading and trailing whitespace. If the value contains markup, it will be stripped. This returns null if it fails to obtain any content.


stripMarkup

public static String stripMarkup(String s)
Strips markup from the provided String s.


getChildElementByTagName

public static Element getChildElementByTagName(Element element,
                                               String name,
                                               boolean exclusive)
                                        throws ProcessException
Returns the child Element of the Element provided whose element type ("tag") name matches the String name, throwing a ProcessException if there are more than one such child element when the boolean exclusive is true, returning the first instance otherwise. Returns null if there is no child element.

Throws:
ProcessException

setPCDATAContent

public static void setPCDATAContent(Element element,
                                    String content)
Sets the PCDATA content of the Element provided. This method will remove any current content of the Element and replace it with a single Text node containing the provided String. If the provided String is null, an empty String is used.


getPCDATAContent

public static String getPCDATAContent(Element element)
                               throws ProcessException
Returns the PCDATA content of the Element provided. This method assumes that the Element does not contain mixed content, only PCDATA, and will throw a ProcessException if Elements or other non-Text nodes are found.

Throws:
ProcessException


The Ceryle Project. Copyright ©2001-2007 Murray Altheim, All Rights Reserved. See LICENSE included with distribution.