org.ceryle.tm
Class LTMParser

java.lang.Object
  extended by org.ceryle.tm.LTMParser

public class LTMParser
extends Object

Implements a parser for a customized version of the Linear Topic Map Notation. This implementation is used with TM4J, and extends the LTM 1.2 functionality. This parser is functionally identical (at least it is intended as such) to LTMProcessor but has no dependencies on core Ceryle application code, nor does it implement TopicMapBuilder.

It will accept Occurrences with either a quote-delimited value or the new LTM 1.2 "[[" and "]]" delimited value for resource data. Values that are URLs are automatically created as resource references (there is currently no way to store a URL as a String).

Because of #MERGEMAP directives, this LTMParser may end up processing multiple LTM files during the course of a single build. The base and resource locators of the initial LTM document are used for the resulting TopicMap. If a #BASEURI directive is encountered, this takes precedence.

Since:
JDK1.3
Version:
$Id: LTMParser.java,v 3.3 2007-06-15 12:09:32 altheim Exp $
Author:
Murray Altheim
See Also:
LTMProcessor

Field Summary
protected  Locator m_baseLoc
          The base Locator of the current TopicMap, as set by its source Locator or its own #BASEURI directive.
protected  Locator m_srcLoc
          The source Locator of the current TopicMap, as provided by its source Locator.
 boolean strictParsing
          Variable controlling whether strict XML parsing is active.
 Topic topicDescription
          A PSI-based Topic identifying the subject of "topic description".
 
Constructor Summary
LTMParser(TopicMapProvider tmprovider, IDGenerator idgenerator)
          Constructor with a TopicMapProvider.
 
Method Summary
 void build(InputStream source, Locator srcLoc, TopicMap topicmap)
          Parses the source indicated by the TypedInputSource source, populating the provided TopicMap.
 void build(InputStream source, Locator srcLoc, TopicMap topicmap, Topic[] addedThemes)
          Parses the source indicated by the TypedInputSource source, populating the provided TopicMap.
 void build(InputStream source, TopicMap topicmap)
          Parses the source indicated by the TypedInputSource source, populating the provided TopicMap.
 void build(Reader reader, Locator srcLoc, TopicMap topicmap)
          Parses the source indicated by the TypedInputSource source, populating the provided TopicMap.
 void build(Reader reader, Locator srcLoc, TopicMap topicmap, Topic[] addedThemes)
          Parses the source indicated by the TypedInputSource source, populating the provided Document.
protected  boolean forward(Reader r)
          Read one character into thisChar, then read ahead nextChar.
 String generateID()
          Uses the existing IDGenerator to return a unique ID within the provided TopicMap topicmap.
 String generateID(String prefix)
          Uses the existing IDGenerator to return a unique ID prefixed with the String prefix.
 TopicMap getTopicMap()
          Returns the TopicMap generated as a result of parsing, or null if not available.
 void mergeMap(TypedInputSource source)
          Merges the LTM file indicated by the TypedInputSource source, populating the provided Document.
 Topic psiDisplay(TopicMap topicmap)
          Returns a Topic using the XTM 1.0 "Display" PSI as its subject indicator.
 Topic psiSort(TopicMap topicmap)
          Returns a Topic using the XTM 1.0 "Sort" PSI as its subject indicator.
 Topic psiSubclass(TopicMap topicmap)
          Returns a Topic using the XTM 1.0 "Subclass" PSI as its subject indicator.
 Topic psiSuperclass(TopicMap topicmap)
          Returns a Topic using the XTM 1.0 "Superclass" PSI as its subject indicator.
 Topic psiSuperclassSubclass(TopicMap topicmap)
          Returns a Topic using the XTM 1.0 "Superclass-Subclass" PSI as its subject indicator.
 void require(Reader r, String string)
          Parse a required string.
 Topic returnTopicByID(TopicMap topicmap, String id)
          Returns either the existing Topic whose ID is id, or a new Topic if a matching one is not already available.
 Association scanAssociation(Reader r)
          Scans until an association end delimiter (")"), returning the Association object.
 void scanComment(Reader r)
          Scans a C++ style comment.
protected  void scanDescription(Reader r, Topic topic)
          Scans forward for a topic description.
protected  void scanDirective(Reader r)
          Scans an LTM directive.
protected  String scanLiteral(Reader r, boolean normWS)
          Scans forward until the closing literal delimiter, returning a String whose LT, GT, QUOT, APOS and AMP characters have been converted to entities.
protected  String scanName(Reader r)
          Scans a Name [XML production 5].
 Occurrence scanOccurrence(Reader r)
          Scans until an occurrence end delimiter ("}"), returning the Occurrence object.
protected  String scanSpace(Reader r, boolean required)
          Scans space characters, reading forward until a non-space character is encountered.
protected  void scanSubjectIdentifier(Reader r, Topic topic)
          Scans forward for Published Subject Indicator (PSI).
 void scanSuperclass(Reader r, Topic subclass)
          Creates a superclass-subclass Association between the provided Topic and the scanned token representing a reference to the superclass Topic.
protected  String scanToEndOfLine(Reader r)
          Scans forward until an end-of-line character, returning the string.
 Topic scanTopic(Reader r)
          Scans until a topic end delimiter ("]"), returning the Topic object.
protected  void scanTopicName(Reader r, Topic topic)
          Scans forward for base name, sort name, and/or display name.
protected  Topic scanType(Reader r)
          Scans forward for topic, occurrence or association type (roleSpec), returning a Topic that serves as the object's type, i.e., the referent in its <instanceOf> or <roleSpec> element.
protected  String scanURI(Reader r)
          Scans a URI, rather loosely.
protected  void scanWhitespace(Reader r, boolean required)
          Scans white space [XML production 3], reading forward until a non-whitespace character is encountered.
protected  void setBaseLocator(TopicMap tm, Locator loc, Locator from)
          Because during processing of an LTM file we might encounter #MERGEMAP directives, causing sub-modules to be parsed, and because these sub-modules may themselves include their own #MERGEMAP or #BASEURI directives, only the original LTM source is able to set the resulting base URI.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

topicDescription

public Topic topicDescription
A PSI-based Topic identifying the subject of "topic description".


strictParsing

public boolean strictParsing
Variable controlling whether strict XML parsing is active. This mostly affects constraints on the content of XML Names.


m_baseLoc

protected Locator m_baseLoc
The base Locator of the current TopicMap, as set by its source Locator or its own #BASEURI directive.


m_srcLoc

protected Locator m_srcLoc
The source Locator of the current TopicMap, as provided by its source Locator.

Constructor Detail

LTMParser

public LTMParser(TopicMapProvider tmprovider,
                 IDGenerator idgenerator)
Constructor with a TopicMapProvider.

Parameters:
tmprovider - the topic map provider used
idgenerator - the ID generator to be used
See Also:
Constants, TopicMapProcessor, TopicMap
Method Detail

mergeMap

public void mergeMap(TypedInputSource source)
              throws TopicMapException
Merges the LTM file indicated by the TypedInputSource source, populating the provided Document. This spawns a new parse() process.

LTM Extension

.

Parameters:
source - the TypedInputSource source to be parsed.
Throws:
TopicMapException

build

public void build(InputStream source,
                  TopicMap topicmap)
           throws TopicMapException
Parses the source indicated by the TypedInputSource source, populating the provided TopicMap.

Parameters:
source - the TypedInputSource source to be parsed
topicmap - the TopicMap to be populated
Throws:
TopicMapException - if any errors are encountered during parsing

build

public void build(InputStream source,
                  Locator srcLoc,
                  TopicMap topicmap)
           throws TopicMapException
Parses the source indicated by the TypedInputSource source, populating the provided TopicMap.

Parameters:
source - the TypedInputSource source to be parsed
srcLoc - the base address of the source being parsed
topicmap - the TopicMap to be populated
Throws:
TopicMapException - if any errors are encountered during parsing

build

public void build(InputStream source,
                  Locator srcLoc,
                  TopicMap topicmap,
                  Topic[] addedThemes)
           throws TopicMapException
Parses the source indicated by the TypedInputSource source, populating the provided TopicMap.

Parameters:
source - the TypedInputSource source to be parsed
srcLoc - the base address of the source being parsed
topicmap - the TopicMap to be populated
addedThemes - the themes to be added as the scope of validity for all objects in the TopicMap
Throws:
TopicMapException - if any errors are encountered during parsing

build

public void build(Reader reader,
                  Locator srcLoc,
                  TopicMap topicmap)
           throws TopicMapException
Parses the source indicated by the TypedInputSource source, populating the provided TopicMap.

Parameters:
reader - the Reader to be parsed
srcLoc - the base address of the source being parsed
topicmap - the TopicMap to be populated
Throws:
TopicMapException - if any errors are encountered during parsing

build

public void build(Reader reader,
                  Locator srcLoc,
                  TopicMap topicmap,
                  Topic[] addedThemes)
           throws TopicMapException
Parses the source indicated by the TypedInputSource source, populating the provided Document.

API Differences

Note that since this builder expects to only use one provider, the provider parameter is only used to check that it is a match with the current provider, throwing an TopicMapException if not.

Parameters:
reader - the Reader to be parsed
srcLoc - the base address of the source being parsed
topicmap - the TopicMap to be populated
addedThemes - the themes to be added as the scope of validity for all objects in the TopicMap
Throws:
TopicMapException - if any errors are encountered during parsing

setBaseLocator

protected void setBaseLocator(TopicMap tm,
                              Locator loc,
                              Locator from)
Because during processing of an LTM file we might encounter #MERGEMAP directives, causing sub-modules to be parsed, and because these sub-modules may themselves include their own #MERGEMAP or #BASEURI directives, only the original LTM source is able to set the resulting base URI. Always using this method to set the base Locator solves any problems.

Parameters:
tm - the TopicMap whose base URI is to be set
loc - the Locator to use as a base URI
from - the base Locator of the current TopicMap

getTopicMap

public TopicMap getTopicMap()
Returns the TopicMap generated as a result of parsing, or null if not available.

Returns:
the current topic map.

generateID

public String generateID()
Uses the existing IDGenerator to return a unique ID within the provided TopicMap topicmap.


generateID

public String generateID(String prefix)
Uses the existing IDGenerator to return a unique ID prefixed with the String prefix.


returnTopicByID

public Topic returnTopicByID(TopicMap topicmap,
                             String id)
                      throws TopicMapException
Returns either the existing Topic whose ID is id, or a new Topic if a matching one is not already available. This is different than org.tm4j.topicmap.TopicMap.getTopicByID(String), as it will create and return a new topic if one does not exist. If the provided String id is null, a unique value will be created. If the provided TopicMap topicmap is null, null is returned.

Parameters:
topicmap - a reference to the topic map to be queried
id - the ID to be searched for
Throws:
DuplicateObjectIDException - if the ID specifies an identifier already assigned to a non-Topic object
TopicMapException

psiSuperclassSubclass

public Topic psiSuperclassSubclass(TopicMap topicmap)
                            throws TopicMapException
Returns a Topic using the XTM 1.0 "Superclass-Subclass" PSI as its subject indicator. This is a convenience method.

Throws:
TopicMapException

psiSuperclass

public Topic psiSuperclass(TopicMap topicmap)
                    throws TopicMapException
Returns a Topic using the XTM 1.0 "Superclass" PSI as its subject indicator. This is a convenience method.

Throws:
TopicMapException

psiSubclass

public Topic psiSubclass(TopicMap topicmap)
                  throws TopicMapException
Returns a Topic using the XTM 1.0 "Subclass" PSI as its subject indicator. This is a convenience method.

Throws:
TopicMapException

psiSort

public Topic psiSort(TopicMap topicmap)
              throws TopicMapException
Returns a Topic using the XTM 1.0 "Sort" PSI as its subject indicator. This is a convenience method.

Throws:
TopicMapException

psiDisplay

public Topic psiDisplay(TopicMap topicmap)
                 throws TopicMapException
Returns a Topic using the XTM 1.0 "Display" PSI as its subject indicator. This is a convenience method.

Throws:
TopicMapException

scanTopic

public Topic scanTopic(Reader r)
                throws TopicMapException
Scans until a topic end delimiter ("]"), returning the Topic object. If the scan results in a String containing significant content, an Topic object will be sent to the XTMFactory for further processing. Note that at this level, no checking for existing Topic objects is done.

LTM Extension: a semicolon following the topic ID (rather than a colon) indicates the superclass of this topic.

Parameters:
r - the Reader used to provide parsed content.
Throws:
TopicMapException

scanSuperclass

public void scanSuperclass(Reader r,
                           Topic subclass)
                    throws TopicMapException
Creates a superclass-subclass Association between the provided Topic and the scanned token representing a reference to the superclass Topic.

NOTE: This is an extension to LTM 1.0.

Throws:
TopicMapException

scanTopicName

protected void scanTopicName(Reader r,
                             Topic topic)
                      throws TopicMapException
Scans forward for base name, sort name, and/or display name. Adds any associated markup to the provided Topic object. Note that display and sort name variants will always be attached to the last base name encountered.
   [id : type = "baseName" ; "sortname" ; "dispname" / scope [NAME+]
              = "baseName2" ; "sortname2" ; "dispname2" / scope2 [NAME+]

 

Parameters:
r - a Reader providing access to the document content.
topic - the topic to decorate with name information.
Throws:
TopicMapException

scanOccurrence

public Occurrence scanOccurrence(Reader r)
                          throws TopicMapException
Scans until an occurrence end delimiter ("}"), returning the Occurrence object. If the scan results in a String containing significant content, an Occurrence object will be sent to the XTMFactory for further processing.

Parameters:
r - the Reader used to provide parsed content.
Throws:
TopicMapException

scanDirective

protected void scanDirective(Reader r)
                      throws TopicMapException
Scans an LTM directive. These are custom keywords designed to provide extended functionality, consisting of name/value pairs. Currently accepted directives include:
#TOPICMAP [whitespace]* "id"
Sets the ID value of the <topicMap> element.

Note that the double quotes surrounding the ID value are optional.

#BASEURI [whitespace]* "URI"
Sets the xml:base attribute value on the <topicMap> element.
#MERGEMAP [whitespace]* "URI" [whitespace] STRING
Merges the LTM document located at uri of syntax STRING with the current document.

LTM 1.0 Extension (now part of LTM 1.2)

.

Parameters:
r - the Reader used to provide parsed content.
Throws:
TopicMapException

scanType

protected Topic scanType(Reader r)
                  throws TopicMapException
Scans forward for topic, occurrence or association type (roleSpec), returning a Topic that serves as the object's type, i.e., the referent in its <instanceOf> or <roleSpec> element. This is scanned as an XML Name (ID value), though it does allow an initial '#' character (which is ignored).

Parameters:
r - the Reader used to provide parsed content.
Throws:
TopicMapException

scanAssociation

public Association scanAssociation(Reader r)
                            throws TopicMapException
Scans until an association end delimiter (")"), returning the Association object. This will include zero or more Member elements, with possible role types and Topics as members. Note that Topics parsed during this process will be added to the TopicMap if they do not already exist (the second example).
    typingTopicId( topic1Id : roleSpec , topic2Id : roleSpec ) 
    typingTopicId( [topic1Id] : roleSpec , [topic2Id] : roleSpec ) 
 

Parameters:
r - the Reader used to provide parsed content.
Throws:
TopicMapException

scanDescription

protected void scanDescription(Reader r,
                               Topic topic)
                        throws TopicMapException
Scans forward for a topic description. Adds this literal to the provided Topic as a typed occurrence.

LTM Extension

.

Throws:
TopicMapException

scanSubjectIdentifier

protected void scanSubjectIdentifier(Reader r,
                                     Topic topic)
                              throws TopicMapException
Scans forward for Published Subject Indicator (PSI). Adds any associated markup to the provided topic element. Uses a <topicRef> if the PSI is a local fragment identifier (i.e., begins with #).

Throws:
TopicMapException

scanLiteral

protected String scanLiteral(Reader r,
                             boolean normWS)
                      throws TopicMapException
Scans forward until the closing literal delimiter, returning a String whose LT, GT, QUOT, APOS and AMP characters have been converted to entities.

Throws:
TopicMapException

require

public void require(Reader r,
                    String string)
             throws TopicMapException
Parse a required string. If it doesn't occur, throw an Exception.

Throws:
TopicMapException

scanToEndOfLine

protected String scanToEndOfLine(Reader r)
                          throws TopicMapException
Scans forward until an end-of-line character, returning the string. This is provided for error recovery.

Throws:
TopicMapException
See Also:
XMLCharTok

scanComment

public void scanComment(Reader r)
                 throws TopicMapException
Scans a C++ style comment. Currently nothing is returned.

Parameters:
r - the Reader used to provide parsed content.
Throws:
TopicMapException

scanURI

protected String scanURI(Reader r)
                  throws TopicMapException
Scans a URI, rather loosely. Reads forward until a whitespace character, returning the String. This can be used to scan URIs or other Strings that have no literal delimiters.

Parameters:
r - the Reader used to provide parsed content.
Throws:
TopicMapException

scanName

protected String scanName(Reader r)
                   throws TopicMapException
Scans a Name [XML production 5]. Reads forward until a non-name character, returning the String.

Parameters:
r - the Reader used to provide parsed content.
Throws:
TopicMapException

scanWhitespace

protected void scanWhitespace(Reader r,
                              boolean required)
                       throws TopicMapException
Scans white space [XML production 3], reading forward until a non-whitespace character is encountered. If required is true, then at least one character of whitespace is required or an error will be generated.

Throws:
TopicMapException

scanSpace

protected String scanSpace(Reader r,
                           boolean required)
                    throws TopicMapException
Scans space characters, reading forward until a non-space character is encountered. If required is true, then at least one space character is required or an error will be generated.

Returns:
the spaces as a String.
Throws:
TopicMapException

forward

protected boolean forward(Reader r)
                   throws TopicMapException
Read one character into thisChar, then read ahead nextChar. Returns true if thisChar is whitespace.

Parameters:
r - the Reader used to provide parsed content.
Returns:
true if the character read is whitespace.
Throws:
TopicMapException


The Ceryle Project. Copyright ©2001-2007 Murray Altheim, All Rights Reserved. See LICENSE included with distribution.