sunlabs.brazil.util
Class LexHTML

java.lang.Object
  extended by sunlabs.brazil.util.LexML
      extended by sunlabs.brazil.util.LexHTML (view source)

public class LexHTML
extends LexML

This class breaks up HTML into tokens.

This class differs slightly from LexML as follows: after certain tags, like the <script> tag, the body that follows is uninterpreted data and ends only at the next, in this case, </script> tag, not at the just the next "<" or ">" character. This is one way that HTML is not fully compliant with XML.

The default set of tags that have this special processing is <script>, <style>, and <xmp>. The user can change this by retrieving the Vector of special tags via getClosingTags, and modifying it as needed.


Field Summary
 
Fields inherited from class sunlabs.brazil.util.LexML
COMMENT, STRING, TAG
 
Constructor Summary
LexHTML(String str)
          Creates a new HTML parser, which can be used to iterate over the tokens in the given string.
 
Method Summary
 Vector getClosingTags()
          Get the set of HTML tags that have the special body-processing behavior mentioned above.
 String getTag()
          Gets the tag name at the begining of the current tag.
 boolean nextToken()
          Advances to the next token, correctly handling HTML tags that have the special body-processing behavior mentioned above.
 void replace(String str)
          Changes the string that this LexHTML is parsing.
 
Methods inherited from class sunlabs.brazil.util.LexML
getArgs, getAttributes, getBody, getLocation, getString, getToken, getType, isSingleton, rest
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LexHTML

public LexHTML(String str)
Creates a new HTML parser, which can be used to iterate over the tokens in the given string.

Parameters:
str - The HTML to parse.
Method Detail

getClosingTags

public Vector getClosingTags()
Get the set of HTML tags that have the special body-processing behavior mentioned above. The Vector is returned; the caller may modify it after calling this method, which will affect this parser's settings.


nextToken

public boolean nextToken()
Advances to the next token, correctly handling HTML tags that have the special body-processing behavior mentioned above. The user can then call the other methods in this class to get information about the new current token.

This method returns the uninterpreted data making up the body of a special HTML tag as a token of type LexML.STRING, even if the body was actually a comment or another tag.

Overrides:
nextToken in class LexML
Returns:
true if a token was found, false if there were no more tokens left.

getTag

public String getTag()
Gets the tag name at the begining of the current tag. In HTML, tag names are defined as case-insensitive, so the name returned is converted to lower case for the convenience of the user.

Overrides:
getTag in class LexML
Returns:
The lower-cased tag name, or null if the current token does not have a tag name.
See Also:
LexML.getTag()

replace

public void replace(String str)
Changes the string that this LexHTML is parsing.

Overrides:
replace in class LexML
Parameters:
str - The string that this LexHTML should now parse.
See Also:
LexML.rest()

Version Kenai-svn-r24, Generated 08/18/09
Copyright (c) 2001-2009, Sun Microsystems.