Augmented Plain Text (APT) Version 1.0Neocortext.Net Note 17 May 2002 |
|
Copyright 2001-2002 Murray Altheim. All Rights Reserved.
The Augmented Plain Text (APT) specification is a design for a simple set of keyword tokens, that when added to a plain text document enables an APT processor to autogenerate valid XHTML documents. This can be used for authoring or Web conversion of existing plain text sources.
This document is intended for review and comment by interested parties. It is a “work in progress,” currently has no formal status, and its publication should not be construed as endorsement by any corporate or academic body. This document may be updated, replaced, rendered obsolete by other documents, or removed from circulation at any time. It is inappropriate to use this document as reference material, or cite it as anything other than a “work in progress.” Distribution of this document is unlimited.
The 'APT' notation is a simple, augmented notation designed to simplify creation of XHTML documents from existing text sources such as email messages. Most current editing software has some facility to generate plain text output, with some claiming to generate HTML. Unfortunately, the "HTML" generated from most every single known product is far from adhering to any known HTML specification, and in the case of (for example, since it is the worst) MS Word, its output is so obtuse and convoluted that a slew of translators have been written to translate its "HTML" output into something akin to HTML, though not without substantial losses of content in many cases. If you have some familiarity with HTML, I suggest you look at MS Word's "HTML". Really unbelievable, especially considering that HTML is generally a pretty simple syntax.
APT was designed to fill a different niche, namely for those wishing to author in plain text, those who have existing text sources, or who'd like an XHTML-valid document with autogenerated, table of contents with hierarchically-numbered sections from what is ostensibly a plain text document (with a few simple codes added). Yes, APT is simple. It doesn't have every known Web feature, doesn't create JavaScript buttons or fry your bacon for you. It does simplify web authoring for those who think web authoring should be simple and straightforward. You can spend some time playing with the CSS stylesheet if you really want your output to look different than the default, or take the output document as input for further processing (perhaps adding your own JavaScript buttons and bacon).
An APT document looks something like this:
#APT V1.0
#AUTHOR Tim Bunwich
#TITLE Not a Normal Day at the Park
Today I went to
http://centralpark.org/home.html #LINK Central Park
to feed some squirrels, a thing I do most everyday. Well, for some reason
the squirrels seemed agitated, a bit put off at simply accepting the nuts
I was handing out.
I was at the point of positioning a piece of pecan in front of this big
black squirrel's nose when suddenly he ran up my arm and stood on the top
of my head, then began squealing wildly. I froze in place, not knowing
quite what to do. These little buggers have very sharp claws and teeth,
and me with no hair... well, I was afraid for my scalp.
All around me were squirrels, all just a few feet from my Berkenstock'd
toes. I knew my life was about to change for the worse.
Pretty simple?
APT uses keywords that start with a hash symbol (e.g., #AUTHOR) that occur in column 1 (i.e., the beginning of a line) to denote an APT statement. APT parsers should ignore keywords occurring elsewhere, or unknown keywords (perhaps emitting a warning when this occurs), so that other instances of hash characters followed by unknown tokens have no effect (other than to be replicated in the output file).
The APT syntax is designed according to implementation levels, to allow for varying levels of support. Level 1 is quite simple, Level 2 provides general link support, with Level 3 providing inclusions and other features.
Level 3 APT processors may optionally preserve HTML markup occurring inline but Level 1 and 2 processors should autogenerate the document title, headings, divisions and paragraphs, as well as a table of contents using heading titles. Future optional features will include autogeneration of hierarchical section numbers. Note that if necessary, a hash character can be escaped using its XML character entity equivalent ("#"). A processor should be labeled as according to its implementation level.
All APT statements start with an APT keyword beginning in column one and continue to the end of the line. Lines may be continued with a backslash onto the next line.
#LINK is a bit of a special case. APT processors will note the beginning of http: and ftp: URLs (scanning until the first whitespace or end-of-line) and autogenerate XHTML links where they exist, using the URL itself as the link text, unless the token following the URL is #LINK, in which case the link text being the content following #LINK to the end of that line.
The following keywords should be supported in all APT Level 1 processors:
("WS" stands for whitespace: space or tab characters)
The following keywords should be supported in all APT Level 2 processors:
The following keywords should be supported in all APT Level 3 processors:
Additionally, Level 3 processors should allow for a subset of existing inline HTML markup to be normalized and preserved in output.
Examples:
#APT V1.0
#AUTHOR Igor Rostropovich
#EMAIL igor@rostropovich.org
#SETHEAD 2
#HEAD It Takes a Virtual Village \
To Make a Virtual Village Idiot
Whitespace between blocks of plaintext will automatically create paragraph breaks. The document will use #TITLE as both the document title and its first displayed <h1> heading, the remainder of headings being <h2> headings. The TOC is autocreated from headings.
Transclusions: Note that filenames or URLs must be in double quotes. For example, to include an external file as an answer to a question:
#DFN My first question is how many monkeys? |
#INCLUDE "answer1.html"
NOTE: currently unsupported are #INCLUDE and backslashes to continue lines.
A sample of an APT source and the generated output from Ceryle:
An APT source document must be a text document. This source document undergoes a number of transformations, enumerated below: