org.ceryle.convert
Class LtmParser

java.lang.Object
  extended by org.ceryle.convert.LtmParser
All Implemented Interfaces:
Constants

public class LtmParser
extends Object
implements Constants

Implements a parser for the Linear Topic Map Notation as described in the Ontopia paper of the same name, by Lars Marius Garshol. This implementation uses Ceryle's XtmProcessor (not TM4J), and is not entirely conformant with Ontopia's LTM; instances of this are marked by the phrase "LTM Extension". In particular, it uses several new directives, several of which have been incorporated into the LTM specification. It also uses a different syntax for occurrence resource data.

Note: This class is not deprecated, but is not intended for general consumption, being the older supported method of creating PSI topic maps from LTM files. It is superceded by LTMProcessor.

Since:
JDK1.3
Version:
$Id: LtmParser.java,v 3.6 2007-06-15 12:09:22 altheim Exp $
Author:
Murray Altheim

Field Summary
 boolean strictParsing
          Variable controlling whether strict XML parsing is active.
 
Fields inherited from interface org.ceryle.util.Constants
Amp, AMP, Apos, APOS, APP_NAME, APP_VERSION_NAME, APP_VERSION_NUMBER, AT, BANG, BAR, BIT_dev, BIT_http, BIT_ignoreCom, BIT_merge, BIT_none, BIT_noPreload, BIT_simple, BIT_ui, BIT_unused1, BIT_unused2, BIT_unused3, BIT_validating, BIT_verbose, BIT_xcatalog, BIT_xlink, BIT_xmlnsAware, BIT_xsd, BROKB, BSlash, BSLASH, CERYLE_CREDITS_FILE, CERYLE_DATA_DIR, CERYLE_HIST_FILE, CERYLE_LICENSE_FILE, CERYLE_PREF_FILE, CERYLE_PROP, CERYLE_PROP_BASE, CERYLE_PROP_FILE, CERYLE_RSRC_DIR, CERYLE_THANKS_FILE, CFLEX, Colon, COLON, Comma, COMMA, CR, CRet, Dash, DASH, DEFAULT_DIRECTORY, DEVELOPER, DOLR, Dot, DOT, EIGHT, Ellip, EOF, EqQuot, EQUAL, Equals, False, FALSE, FileProt, Filesep, FileURL, Five, FIVE, FONTSIZES, Four, FOUR, FSchar, GCOL_PROP, GRAPH_PROP, GRAVE, GT, Hash, HASH, HOME_DIRECTORY, HOME_DIRECTORY_PATH, HOME_DIRECTORY_URL, HTML, HttpProt, HttpURL, INDENT, INDENT_0, INDENT_1, INDENT_10, INDENT_2, INDENT_3, INDENT_4, INDENT_5, INDENT_6, INDENT_7, INDENT_8, INDENT_9, INIT, LCURL, LCurly, LF, LFS, Localhost, LPAR, LParen, LS, LSBrkt, LSBrkt2, LSQB, LT, MT, NBSP, NEL, NINE, NL, NL20x, NLchar, No, Null, NULL, NULL_STATE, NumParams, One, ONE, Pathsep, Percent, PERO, PLUS, Prcnt, PS, QMark, QMARK, Quot, QUOT, RCURL, RCurly, RESOURCE_BUNDLE, RPAR, RParen, RSBrkt, RSBrkt2, RSQB, Semi, SEMI, SEVEN, SIX, Slash, SLASH, SP, SPACE, Stago, Star, STAR, Tab, TAB, Tee, Three, THREE, Tilde, TILDE, TM_PROP, True, TRUE, Two, TWO, Under, UNDER, URI, URL, VBar, WIKI_PROP, XNodePfx, XNodeURL, XtmExt, Yes, Zero, ZERO
 
Constructor Summary
LtmParser()
          Default constructor.
LtmParser(XtmProcessor xtmproc)
          Constructor with a preexisting XtmProcessor xtmproc.
 
Method Summary
protected  boolean forward(Reader r)
          Read one character into thisChar, then read ahead nextChar.
 Reader getReader(String pathname)
          Instantiates a BufferedReader to process the pathname as a URL or file reference, returning the Reader.
 XtmTopicMap getTopicMap()
          Returns the XtmTopicMap generated as a result of parsing, or null if not available.
 XtmDocument getXtmDocument()
          Return the current XtmDocument object, null if undefined.
 void mergeMap(TypedInputSource source, String syntax)
          Merges the LTM file indicated by the TypedInputSource source, populating the provided Document.
 void parse(TypedInputSource source)
          Parses the source indicated by the TypedInputSource source, populating the provided Document.
 void require(Reader r, String string, String errMsg)
          Parse a required string.
 XtmAssociation scanAssociation(Reader r)
          Scans until an association end delimiter (")"), returning the XtmAssociation object.
 XtmComment scanComment(Reader r)
          Scans a C++ style comment, returns upon finding a comment end delimiter.
protected  void scanDescription(Reader r, XtmTopic topic)
          Scans forward for published Subject Indicator (PSI).
protected  void scanDirective(Reader r)
          Scans an LTM directive.
protected  String scanLiteral(Reader r, boolean normWS)
          Scans forward until the closing literal delimiter, returning a String whose LT, GT, QUOT, APOS and AMP characters have been converted to entities.
protected  String scanName(Reader r)
          Scans a Name [XML production 5].
 XtmOccurrence scanOccurrence(Reader r)
          Scans until an occurrence end delimiter ("}"), returning the XtmOccurrence object.
 XtmResourceData scanResourceData(Reader r)
          Scans the '[[' and ']]' delimited resource data of an inline data Occurrence, returning an XtmResourceData object.
protected  String scanSpace(Reader r, boolean required)
          Scans space characters, reading forward until a non-space character is encountered.
protected  void scanSubjectIdentifier(Reader r, XtmTopic topic)
          Scans forward for Published Subject Indicator (PSI).
 void scanSuperclass(Reader r, XtmTopic subclass)
          Creates a superclass-subclass association between the provided topic and the scanned token representing a reference to the superclass Topic.
protected  String scanToEndOfLine(Reader r)
          Scans forward until an end-of-line character, returning the string.
 XtmTopic scanTopic(Reader r)
          Scans until a topic end delimiter ("]"), returning the XtmTopic object.
protected  void scanTopicName(Reader r, XtmTopic topic)
          Scans forward for base name, sort name, and/or display name.
protected  XtmReference scanType(Reader r)
          Scans forward for topic, occurrence or association type (roleSpec), returning an XtmReference object that serves as the object's type, i.e., the reference in its <instanceOf> or <roleSpec> element.
protected  String scanURI(Reader r)
          Scans a URI, rather loosely.
protected  String scanWhitespace(Reader r, boolean required)
          Scans white space [XML production 3], reading forward until a non-whitespace character is encountered.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

strictParsing

public boolean strictParsing
Variable controlling whether strict XML parsing is active. This mostly affects constraints on the content of XML Names.

Constructor Detail

LtmParser

public LtmParser()
Default constructor.


LtmParser

public LtmParser(XtmProcessor xtmproc)
Constructor with a preexisting XtmProcessor xtmproc.

Method Detail

mergeMap

public void mergeMap(TypedInputSource source,
                     String syntax)
              throws ProcessException
Merges the LTM file indicated by the TypedInputSource source, populating the provided Document. This spawns a new parse() process. This processor does not support any syntaxes except LTM, and will throw an exception if the value of syntax is not "ltm".

LTM Extension (now part of LTM 1.2)

.

Parameters:
source - the TypedInputSource source to be parsed.
syntax - the syntax of the incoming document (only "ltm" is supported).
Throws:
ProcessException

parse

public void parse(TypedInputSource source)
           throws ProcessException
Parses the source indicated by the TypedInputSource source, populating the provided Document.

Parameters:
source - the TypedInputSource source to be parsed
Throws:
ProcessException - if any errors are encountered during parsing

getXtmDocument

public XtmDocument getXtmDocument()
Return the current XtmDocument object, null if undefined.

Returns:
the current XtmDocument object.

getTopicMap

public XtmTopicMap getTopicMap()
Returns the XtmTopicMap generated as a result of parsing, or null if not available.

Returns:
the current topic map.

scanTopic

public XtmTopic scanTopic(Reader r)
                   throws XtmException
Scans until a topic end delimiter ("]"), returning the XtmTopic object. If the scan results in a String containing significant content, an XtmTopic object will be sent to the XTMFactory for further processing. Note that at this level, no checking for existing XtmTopic objects is done.

LTM Extension: a semicolon following the topic ID (rather than a colon) indicates the superclass of this topic.

Parameters:
r - the Reader used to provide parsed content.
Throws:
XtmException

scanSuperclass

public void scanSuperclass(Reader r,
                           XtmTopic subclass)
                    throws XtmException
Creates a superclass-subclass association between the provided topic and the scanned token representing a reference to the superclass Topic.

Throws:
XtmException

scanTopicName

protected void scanTopicName(Reader r,
                             XtmTopic topic)
                      throws XtmException
Scans forward for base name, sort name, and/or display name. Adds any associated markup to the provided XtmTopic object. Note that display and sort name variants will always be attached to the last base name encountered.
   [id : type = / scope "basename" / scope "basename" 
                // scope "variantName" ; "sortname" ; "dispname"]
 

Parameters:
r - a Reader providing access to the document content.
topic - the topic to decorate with name information.
Throws:
XtmException

scanOccurrence

public XtmOccurrence scanOccurrence(Reader r)
                             throws XtmException
Scans until an occurrence end delimiter ("}"), returning the XtmOccurrence object. If the scan results in a String containing significant content, an XtmOccurrence object will be sent to the XTMFactory for further processing.

Parameters:
r - the Reader used to provide parsed content.
Throws:
XtmException

scanResourceData

public XtmResourceData scanResourceData(Reader r)
                                 throws XtmException
Scans the '[[' and ']]' delimited resource data of an inline data Occurrence, returning an XtmResourceData object.

Throws:
XtmException

scanDirective

protected void scanDirective(Reader r)
                      throws XtmException
Scans an LTM directive. These are custom keywords designed to provide extended functionality, consisting of name/value pairs. Currently accepted directives include:
#TMID [whitespace]* "id"
Sets the ID value of the <topicMap> element.
#BASEURI [whitespace]* "URI"
Sets the xml:base attribute value on the <topicMap> element.
#MERGEMAP [whitespace]* "URI" [whitespace] STRING
Merges the LTM document located at uri of syntax STRINGwith the current document.

LTM Extension

.

Parameters:
r - the Reader used to provide parsed content.
Throws:
XtmException

scanType

protected XtmReference scanType(Reader r)
                         throws XtmException
Scans forward for topic, occurrence or association type (roleSpec), returning an XtmReference object that serves as the object's type, i.e., the reference in its <instanceOf> or <roleSpec> element.

Note 1: This uses a rather stupid algorithm for determining the interpretation of the scanned URI reference. If the String is an XML Name, doesn't contain a period ('.') character, and doesn't begin with either 'http://' or 'file:/', it is interpreted as a reference to an ID, and prepended with a hash ('#') character in the created reference.

Note 2: If the type is recognized as the PSI for instance, display, description, or sort, a static XtmReference for that PSI (provided by the XtmProcessor) is returned.

Parameters:
r - the Reader used to provide parsed content.
Throws:
XtmException

scanAssociation

public XtmAssociation scanAssociation(Reader r)
                               throws XtmException
Scans until an association end delimiter (")"), returning the XtmAssociation object. This will include zero or more XtmMember elements, with possible role types and XtmTopics as members. Note that XtmTopics parsed during this process will be added to the XtmDocument if they do not already exist.
    typingTopicId( [topic1Id] : roleSpec , [topic2Id] : roleSpec ) 
 

Parameters:
r - the Reader used to provide parsed content.
Throws:
XtmException

scanDescription

protected void scanDescription(Reader r,
                               XtmTopic topic)
                        throws XtmException
Scans forward for published Subject Indicator (PSI). Adds any associated markup to the provided topic element.

LTM Extension

.

Throws:
XtmException

scanSubjectIdentifier

protected void scanSubjectIdentifier(Reader r,
                                     XtmTopic topic)
                              throws XtmException
Scans forward for Published Subject Indicator (PSI). Adds any associated markup to the provided topic element. Uses a <topicRef> if the PSI is a local fragment identifier (i.e., begins with #).

Throws:
XtmException

scanLiteral

protected String scanLiteral(Reader r,
                             boolean normWS)
                      throws XtmException
Scans forward until the closing literal delimiter, returning a String whose LT, GT, QUOT, APOS and AMP characters have been converted to entities.

Throws:
XtmException

require

public void require(Reader r,
                    String string,
                    String errMsg)
             throws XtmException
Parse a required string. If it doesn't occur, throw an Exception.

Throws:
XtmException

scanToEndOfLine

protected String scanToEndOfLine(Reader r)
                          throws XtmException
Scans forward until an end-of-line character, returning the string. This is provided for error recovery.

Throws:
XtmException
See Also:
XMLCharTok

scanComment

public XtmComment scanComment(Reader r)
                       throws XtmException
Scans a C++ style comment, returns upon finding a comment end delimiter.

Parameters:
r - the Reader used to provide parsed content.
Throws:
XtmException

scanURI

protected String scanURI(Reader r)
                  throws XtmException
Scans a URI, rather loosely. Reads forward until a whitespace character, returning the String. This can be used to scan URIs or other Strings that have no literal delimiters.

Parameters:
r - the Reader used to provide parsed content.
Throws:
XtmException

scanName

protected String scanName(Reader r)
                   throws XtmException
Scans a Name [XML production 5]. Reads forward until a non-name character, returning the String.

Parameters:
r - the Reader used to provide parsed content.
Throws:
XtmException

scanWhitespace

protected String scanWhitespace(Reader r,
                                boolean required)
                         throws XtmException
Scans white space [XML production 3], reading forward until a non-whitespace character is encountered. If required is true, then at least one character of whitespace is required or an error will be generated.

Returns:
the whitespace as a String.
Throws:
XtmException

scanSpace

protected String scanSpace(Reader r,
                           boolean required)
                    throws XtmException
Scans space characters, reading forward until a non-space character is encountered. If required is true, then at least one space character is required or an error will be generated.

Returns:
the spaces as a String.
Throws:
XtmException

forward

protected boolean forward(Reader r)
                   throws XtmException
Read one character into thisChar, then read ahead nextChar. Returns true if thisChar is whitespace.

Parameters:
r - the Reader used to provide parsed content.
Returns:
true if the character read is whitespace.
Throws:
XtmException

getReader

public Reader getReader(String pathname)
                 throws FileNotFoundException,
                        IOException,
                        UnknownHostException
Instantiates a BufferedReader to process the pathname as a URL or file reference, returning the Reader.

Parameters:
pathname - the pathname of the file intended to be read.
Returns:
a Reader to the file.
Throws:
FileNotFoundException
IOException
UnknownHostException


The Ceryle Project. Copyright ©2001-2007 Murray Altheim, All Rights Reserved. See LICENSE included with distribution.