org.ceryle.xml
Class XMLCharTok

java.lang.Object
  extended by org.ceryle.xml.XMLCharTok

public class XMLCharTok
extends Object

The XMLCharTok class provides identity functions for some of the fundamental character and token XML Productions, as described in the XML 1.0 2nd Edition W3C Recommendation. Please see Appendix B for detailed notes about character classes and other Unicode-related issues.

NOTE: This class has been amended to reflect changes to name character and whitespace handling found in the XML 1.1 Working Draft (as of 25 April 2002). Methods affected include isNameStartChar(char), isNameChar(char), and isSpace(char), as well as any methods affected by these changes. New methods include isLineBreakChar(char), isLineBreakSequence(char,char), isNameStartChar10(char) and isNameChar10(String). Not all changes are marked, so unless "10" appears in the method name it should be assumed that (a) the method is not affected, or (b) it incorporates XML 1.1 changes.

Changed methods are marked CHANGED.
New methods are marked NEW.

[XML 1.0]:
Extensible Markup Language (XML) 1.0 (Second Edition)
W3C Recommendation 6 October 2000

This version: http://www.w3.org/TR/2000/REC-xml-20001006
Latest version: http://www.w3.org/TR/REC-xml
[XML 1.1]:
XML 1.1
W3C Working Draft 25 April 2002

This version: http://www.w3.org/TR/2002/WD-xml11-20020425
Latest version: http://www.w3.org/TR/xml11

Since:
JDK1.3
Version:
$Id: XMLCharTok.java,v 3.5 2007-06-15 12:10:29 altheim Exp $
Author:
Murray Altheim

Constructor Summary
XMLCharTok()
           
 
Method Summary
static boolean containsMarkup(String s)
          Returns true if the String s contains any XML markup characters, including <, >, &, or the String "]]>".
static boolean isBaseChar(char c)
          Returns true if char c is a member of BaseChar [XML 1.0 production 85].
static boolean isChar(char c)
          Returns true if char c is a member of Char [XML 1.0 production 2].
static boolean isCombiningChar(char c)
          Returns true if char c is a member of CombiningChar [XML 1.0 production 87].
static boolean isDigit(char c)
          Returns true if char c is a member of Digit [XML 1.0 production 88].
static boolean isExtender(char c)
          Returns true if char c is a member of Extender [XML 1.0 production 89].
static boolean isHexChar(char c)
          Returns true if the character is a hexidecimal character [0-9A-F].
static boolean isIdeographic(char c)
          Returns true if char c is a member of Ideographic [XML 1.0 production 86].
static boolean isLetter(char c)
          Returns true if char c is a member of Letter [XML 1.0 production 84].
static boolean isLineBreakChar(char c)
          Returns true if char c is a recognized line break character, as described in the XML 1.1 Working Draft.
static boolean isLineBreakSequence(char thisChar, char nextChar)
          Returns true if the current char thisChar and the next character after it nextChar follow certain patterns considered as a line break sequence, as described in the XML 1.1 Working Draft.
static boolean isName(String s)
          Returns true if String s conforms to Name [XML 1.1 production 5].
static boolean isName10(String s)
          Returns true if String s conforms to Name [XML 1.0 production 5].
static boolean isNameChar(char c)
          Returns true if char c is a member of NameChar [XML 1.1 production 4a].
static boolean isNameChar10(char c)
          Returns true if char c is a member of NameChar [XML 1.0 production 4].
static boolean isNames(String s)
          Returns true if String s conforms to Names [XML 1.0 production 6], a whitespace-delimited list of XML Names.
static boolean isNameStartChar(char c)
          Returns true if char c is an allowed first character of an XML 1.1 Name [XML 1.1 production 4].
static boolean isNameStartChar10(char c)
          Returns true if char c is an allowed first character of an XML Name [XML 1.0 production 4].
static boolean isNmtoken(String s)
          Returns true if String s conforms to Nmtoken [XML 1.0 production 7].
static boolean isNmtokens(String s)
          Returns true if String s conforms to Names [XML 1.0 production 8], a whitespace-delimited list of XML Nmtokens (Name tokens).
static boolean isPubidChar(char c)
          Returns true if char c is a PubidChar [XML 1.0 production 13].
static boolean isPubidLiteral(String s)
          Returns true if the String s conforms to PubidLiteral [XML Production 12], including the single- or double-quote delimiters.
static boolean isSpace(char c)
          Returns true if char c is a member of S (space) [XML 1.1 production 3].
static boolean isSpace(String s)
          Returns true if the String s is completely composed of whitespace.
static boolean isSpace10(char c)
          Returns true if char c is a member of S (space) [XML 1.0 production 3].
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

XMLCharTok

public XMLCharTok()
Method Detail

containsMarkup

public static boolean containsMarkup(String s)
Returns true if the String s contains any XML markup characters, including <, >, &, or the String "]]>". Since single and double quotes are safe within PCDATA, they are ignored.


isChar

public static boolean isChar(char c)
Returns true if char c is a member of Char [XML 1.0 production 2].


isSpace

public static boolean isSpace(char c)
Returns true if char c is a member of S (space) [XML 1.1 production 3].
CHANGED.
See note on XML 1.1 changes.


isSpace10

public static boolean isSpace10(char c)
Returns true if char c is a member of S (space) [XML 1.0 production 3].
NEW.
See note on XML 1.1 changes.


isLineBreakChar

public static boolean isLineBreakChar(char c)
Returns true if char c is a recognized line break character, as described in the XML 1.1 Working Draft.
NEW.
See note on XML 1.1 changes.


isLineBreakSequence

public static boolean isLineBreakSequence(char thisChar,
                                          char nextChar)
Returns true if the current char thisChar and the next character after it nextChar follow certain patterns considered as a line break sequence, as described in the XML 1.1 Working Draft. This also allows nextChar to be -1, since a line break character is usually the last character in a file.

NOTE: This is substantially the same as XML 1.0 except for changes from Unicode and for use of XML on IBM mainframes. Any sequence of characters that fulfills this method (i.e., returning true) is to be normalized to a single 0x0A (linefeed) character. Because this method returns true in cases where nextChar is part of the line break sequence, this next character must be tested and removed when it is a 0x0A or 0x85.
NEW.
See note on XML 1.1 changes.


isNameChar

public static boolean isNameChar(char c)
Returns true if char c is a member of NameChar [XML 1.1 production 4a].
CHANGED.
See note on XML 1.1 changes.


isNameChar10

public static boolean isNameChar10(char c)
Returns true if char c is a member of NameChar [XML 1.0 production 4].
NEW.
See note on XML 1.1 changes.


isNameStartChar10

public static boolean isNameStartChar10(char c)
Returns true if char c is an allowed first character of an XML Name [XML 1.0 production 4]. This method is maintained for backward compatibility with XML 1.0.
NEW.
See note on XML 1.1 changes.


isNameStartChar

public static boolean isNameStartChar(char c)
Returns true if char c is an allowed first character of an XML 1.1 Name [XML 1.1 production 4].
CHANGED.
See note on XML 1.1 changes.


isName

public static boolean isName(String s)
Returns true if String s conforms to Name [XML 1.1 production 5].
See note on XML 1.1 changes.


isName10

public static boolean isName10(String s)
Returns true if String s conforms to Name [XML 1.0 production 5].
See note on XML 1.1 changes.


isNames

public static boolean isNames(String s)
Returns true if String s conforms to Names [XML 1.0 production 6], a whitespace-delimited list of XML Names.
See note on XML 1.1 changes.


isNmtoken

public static boolean isNmtoken(String s)
Returns true if String s conforms to Nmtoken [XML 1.0 production 7].
See note on XML 1.1 changes.


isNmtokens

public static boolean isNmtokens(String s)
Returns true if String s conforms to Names [XML 1.0 production 8], a whitespace-delimited list of XML Nmtokens (Name tokens).
See note on XML 1.1 changes.


isSpace

public static boolean isSpace(String s)
Returns true if the String s is completely composed of whitespace.
See note on XML 1.1 changes.


isLetter

public static boolean isLetter(char c)
Returns true if char c is a member of Letter [XML 1.0 production 84].


isBaseChar

public static boolean isBaseChar(char c)
Returns true if char c is a member of BaseChar [XML 1.0 production 85].


isIdeographic

public static boolean isIdeographic(char c)
Returns true if char c is a member of Ideographic [XML 1.0 production 86].


isCombiningChar

public static boolean isCombiningChar(char c)
Returns true if char c is a member of CombiningChar [XML 1.0 production 87].


isDigit

public static boolean isDigit(char c)
Returns true if char c is a member of Digit [XML 1.0 production 88].


isExtender

public static boolean isExtender(char c)
Returns true if char c is a member of Extender [XML 1.0 production 89].


isPubidChar

public static boolean isPubidChar(char c)
Returns true if char c is a PubidChar [XML 1.0 production 13]. The colon is optional as per section 2.3 of XML specification.


isPubidLiteral

public static boolean isPubidLiteral(String s)
Returns true if the String s conforms to PubidLiteral [XML Production 12], including the single- or double-quote delimiters.


isHexChar

public static boolean isHexChar(char c)
Returns true if the character is a hexidecimal character [0-9A-F].



The Ceryle Project. Copyright ©2001-2007 Murray Altheim, All Rights Reserved. See LICENSE included with distribution.