|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.ceryle.xml.Sniffer
public class Sniffer
Determines the MIME type of a document by "sniffing" the beginning of the file. Rather than a large number of set/get methods, this uses a lot of public member variables, which are reset prior to each sniff.
This uses the MIME types found in MIME.
Note 1: The rules followed here regarding XHTML are not strictly correct, as this class does not require a DOCTYPE to classify something as XHTML, merely that it be well-formed and declare the XHTML XML namespace. If strict conformance is desired, the static variable STRICT_XHTML may be set true.
Note 2: Because the heuristic for determining file type is somewhat prone to error, this will not always return the correct result. In cases where it doesn't, the document type may be some sort of valid or invalid hybrid. Despite the years and significant time spent arguing about these issues, the rules for this sort of thing have yet to be satisfactorily standardized. This has only gotten worse given the proliferation of XML markup languages with no reasonable architecture for interoperability and interchange.
MIME| Field Summary | |
|---|---|
boolean |
claimsHTML
A boolean indicating the sniffed document claims it is HTML, by having a <html> document element. |
boolean |
claimsXHTML
A boolean indicating the sniffed document claims to be some form of XHTML, by containing a combination of factors (noting STRICT_XHTML). |
boolean |
claimsXHTML1F
A boolean indicating the sniffed document claims it is XHTML Frameset, by including its public identifier in its DOCTYPE declaration. |
boolean |
claimsXHTML1S
A boolean indicating the sniffed document claims it is XHTML Strict, by including its public identifier in its DOCTYPE declaration. |
boolean |
claimsXHTML1T
A boolean indicating the sniffed document claims it is XHTML Transitional, by including its public identifier in its DOCTYPE declaration. |
boolean |
claimsXHTMLns
A boolean indicating the sniffed document claims it is XHTML, by containing a declaration for the XHTML namespace. |
boolean |
claimsXML
A boolean indicating the sniffed document claims it is XML, by containing an XML declaration. |
boolean |
claimsXTM
A boolean indicating the sniffed document claims it is XTM, by containing a <topicMap> document element. |
boolean |
claimsXTMns
A boolean indicating the sniffed document claims it is XTM, by containing a declaration for the XTM namespace. |
boolean |
m_verbose
Message verbosity: set false for no messages while sniffing. |
static int |
sniffLength
The number of characters to sniff, following the XML parsing. |
static boolean |
STRICT_XHTML
A boolean flag that when true requires XHTML documents to not only be well-formed XML and declare the XHTML namespace, but contain a recognized DOCTYPE declaration. |
boolean |
valid
A boolean indicating the sniffed document is valid XML, based upon a parse of its content. |
boolean |
wellFormed
A boolean indicating the sniffed document is well-formed XML, based upon a parse of its content. |
| Constructor Summary | |
|---|---|
Sniffer()
Default constructor. |
|
| Method Summary | |
|---|---|
String |
getDescription()
Returns a text description of the status of the last sniff. |
String |
getHTMLTitle()
If the previous sniff indicated HTML, then the Java text Document is still available. |
int |
getMethod()
Returns an int indicating of the serialization method of the last sniff, using the org.ceryle.xml.XMLUtils constants. |
MIME |
getMIMEtype()
Returns an int indicating of the MIME type of the last sniff, This returns null prior to that point. |
Set |
getXHTMLMetadata()
If the previous sniff indicated XHTML, then the DOM Document is still available. |
String |
getXHTMLTitle()
If the previous sniff indicated XHTML, then the DOM Document is still available. |
static boolean |
hasWikiTag(String s)
Returns true if the String s starts with the wiki tag. |
static String |
head(Document doc)
Returns the first part of the Document doc, as long as length sniffLength, or less if the Document isn't that long. |
static String |
head(String s)
Returns the first part of the String s, as long as length sniffLength, or less if the String isn't that long. |
boolean |
isReset()
Returns true if this Sniffer has been reset or has never been used (i.e., its nose is clean). |
String |
sniff(Document doc)
Sniffs the media (MIME) type of the provided java.text.Document doc, setting the type and description, as well as any appropriate booleans. |
String |
sniff(File file)
Sniffs the media (MIME) type of the provided File file, setting the type and description, as well as any appropriate booleans. |
String |
sniff(String content)
Sniffs the media (MIME) type of the provided String content, setting the type and description, as well as any appropriate booleans. |
boolean |
sniffLTM(String s)
Returns true if the percentage of left and right square brackets in the provided text passes a certain threshold combined with the presence of some known key strings. |
boolean |
sniffWiki(String s)
Returns true if the provided content matches a regex for either the wiki tag or a square-bracked wiki link of the form "[abc|abc]". |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static boolean STRICT_XHTML
public static int sniffLength
public boolean claimsXML
public boolean claimsXTM
public boolean claimsXTMns
public boolean claimsHTML
public boolean claimsXHTML
public boolean claimsXHTMLns
public boolean claimsXHTML1S
public boolean claimsXHTML1T
public boolean claimsXHTML1F
public boolean wellFormed
public boolean valid
public boolean m_verbose
| Constructor Detail |
|---|
public Sniffer()
| Method Detail |
|---|
public String getDescription()
public int getMethod()
public MIME getMIMEtype()
public String sniff(File file)
public String sniff(Document doc)
public String sniff(String content)
public boolean sniffWiki(String s)
public static boolean hasWikiTag(String s)
public boolean sniffLTM(String s)
Note: This might also indicate positive when provided with wiki text (which uses square brackets for links), but wiki text stored within Ceryle is expected to have the wiki declaration. Also, This class also sniffs for wiki text prior to sniffing for LTM.
public Set getXHTMLMetadata()
Note that calling sniff again while this method is active may be problematic.
Despite this operating on (at least in theory) XHTML, this method ignores case on the <meta> element as well as the attribute names, erring on the 'safe' side in trying to capture any metadata in the document.
public String getXHTMLTitle()
public String getHTMLTitle()
public boolean isReset()
public static String head(String s)
sniffLength, or less if the String isn't that long. Throws
a NullPointerException if the parameter is null.
public static String head(Document doc)
throws BadLocationException
sniffLength, or less if the Document isn't that long. Throws
a NullPointerException if the parameter is null.
BadLocationException - if unable to extract the Document's content
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||