|
![]() |
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectsunlabs.brazil.handler.HtmlRewriter (view source)
public class HtmlRewriter
This class helps with parsing and rewriting an HTML document. The source document is not changed; a new HTML document is built.
The user can sequentially examine and rewrite each token in the source HTML document. As each token in the document is seen, the user has two choices:
Parsing is implemented lazily, meaning, for example, that unless the user actually asks for attributes of an HTML tag, this parser does not have to spend the time breaking up the attributes.
This class is used by HTML filters to maintain the state of the document and allow the filters to perform arbitrary rewriting.
Field Summary | |
---|---|
LexHTML |
lex
The parser for the source HTML document. |
StringBuffer |
sb
Storage holding the resultant HTML document. |
Constructor Summary | |
---|---|
HtmlRewriter(LexHTML lex)
Creates a new HtmlRewriter from the given HTML parser. |
|
HtmlRewriter(String str)
Creates a new HtmlRewriter that will operate on the given
string. |
Method Summary | |
---|---|
boolean |
accumulate(boolean accumulate)
Turns on or off the automatic accumulation of each token. |
void |
append(String str)
Instead of modifying an existing token, this method allows the user to completely replace the current token with arbitrary new content. |
void |
appendToken()
Appends the current token to the resultant HTML document. |
String |
get(String key)
Returns the value that the specified case-insensitive key maps to in the attributes for the current tag. |
String |
getArgs()
Gets the arguments of the current token as a string. |
String |
getBody()
Gets the body of the current token as a string. |
StringMap |
getMap()
Return a copy of the StringMap of attributes. |
String |
getTag()
Gets the current tag's name. |
String |
getToken()
Gets the raw string making up the entire current token, including the angle brackets or comment delimiters, if applicable. |
int |
getType()
Gets the type of the current token. |
boolean |
isSingleton()
See if the current tag a singleton. |
Enumeration |
keys()
Returns an enumeration of the keys in the current tag's attributes. |
void |
killToken()
Tells this HtmlRewriter not to append the current token
to the resultant HTML document. |
boolean |
nextTag()
A convenence method built on top of nextToken . |
boolean |
nextToken()
Advances to the next token in the source HTML document. |
void |
pushback()
Puts the current token back. |
void |
put(String key,
String value)
Maps the given case-insensitive key to the specified value in the current tag's attributes. |
static String |
quote(String str)
Helper class to quote a attribute's value when the value is being written to the resultant HTML document. |
void |
remove(String key)
Removes the given case-insensitive key and its corresponding value from the current tag's attributes. |
void |
reset()
Forgets all the tokens that have been appended to the resultant HTML document so far, including the current token. |
void |
setSingleton(boolean singleton)
Make the current tag a singleton. |
void |
setTag(String tag)
Changes the current tag's name. |
void |
setType(int type)
Sets the type of the current token. |
int |
tagCount()
Return count of tags seen so far |
int |
tokenCount()
Return count of tokens seen so far |
String |
toString()
Returns the "new" rewritten HTML document. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public LexHTML lex
public StringBuffer sb
Constructor Detail |
---|
public HtmlRewriter(LexHTML lex)
HtmlRewriter
from the given HTML parser.
lex
- The HTML parser.public HtmlRewriter(String str)
HtmlRewriter
that will operate on the given
string.
str
- The HTML document.Method Detail |
---|
public String toString()
At any time, this method can be called to return the current state of the HTML document. The return value is the result of processing the source document up to this point in time; the unprocessed remainder of the source document is not considered.
Due to the implementation, calling this method may be expensive.
Specifically, calling this method a second (or further) time for
a given HtmlRewriter
may involve copying temporary
strings around. The pessimal case would be to call this method
every time a new token is appended.
toString
in class Object
public boolean nextToken()
The other purpose of this function is to "do the right thing", which is to append the token we just processed to the resultant HTML document, unless the user has already appended something else.
A sample program follows. This program changes all
<img>
tags to <form>
tags,
deletes all <table>
tags, capitalizes
and bolds each string token, and passes all other tokens through
unchanged, to illustrate how nextToken
interacts with
some of the other methods in this class.
HtmlRewriter hr = new HtmlRewriter(str); while (hr.nextToken()) { switch (hr.getType()) { case LexHTML.TAG: if (hr.getTag().equals("img")) { // Change the tag name w/o affecting the attributes. hr.setTag("form"); } else if (hr.getTag().equals("table")) { // Eliminate the entire "table" token. hr.killToken(); } break; case LexHTML.STRING: // Append a new sequence in place of the existing token. hr.append("<b>" + hr.getToken().toUpperCase() + "</b>"); break; } // Any tokens we didn't modify get copied through unchanged. }
true
if there are tokens left to process,
false
otherwise.public boolean nextTag()
nextToken
.
Advances to the next HTML tag. All intervening strings and comments
between the last tag and the new current tag are copied through
unchanged. This method can be used when the caller wants to process
only HTML tags, without having to manually check the type of each
token to see if it is actually a tag.
true
if there are tokens left to process,
false
otherwise.public int getType()
LexML.getType()
public void setType(int type)
public boolean isSingleton()
<
.
public void setSingleton(boolean singleton)
<
.
public String getToken()
LexML.getToken()
public String getTag()
null
if the
current token does not have a tag nameLexHTML.getTag()
public void setTag(String tag)
tag
- New tag namepublic String getBody()
LexML.getBody()
public String getArgs()
LexML.getArgs()
public String get(String key)
<table border rows=2>
: get("border")
returns the empty string "".
get("rows")
returns 2.
Surrounding single and double quote marks that occur in the literal
tag are removed from the values reported. So, for the tag
<a href="/foo.html" target=_top onclick='alert("hello")'>
:
get("href")
returns /foo.html .
get("target")
returns _top .
get("onclick")
returns alert("hello") .
key
- The key to lookup in the current tag's attributes.
null
if the key was not in the attributes.LexML.getAttributes()
public void put(String key, String value)
The value can be retrieved by calling get
with a
key that is case-insensitive equal to the given key.
If the attributes already contained a mapping for the given key, the old value is forgotten and the new specified value is used. The case of the prior key is retained in that case. Otherwise the case of the new key is used and a new mapping is made.
key
- The new key. May not be null
.value
- The new value. May be not be null
.public void remove(String key)
key
- The key that needs to be removed. Must not be
null
.public Enumeration keys()
get
to get the values of the attributes.
public void append(String str)
This method may be called multiple times while processing the current
token to add more and more data to the resultant HTML document.
Before and/or after calling this method, the appendToken
method may also be called explicitly in order to add the current token
to the resultant HTML document.
Following is sample code illustrating how to use this method
to put bold tags around all the <a>
tags.
HtmlRewriter hr = new HtmlRewriter(str); while (hr.nextTag()) { if (hr.getTag().equals("a")) { hr.append("<b>"); hr.appendToken(); } else if (hr.getTag().equals("/a")) { hr.appendToken(); hr.append("</b>"); } }The calls to
appendToken
are necessary. Otherwise,
the HtmlRewriter
could not know where and when to
append the existing token in addition to the new content provided
by the user.
str
- The new content to append. May be null
,
in which case no new content is appended (the equivalent
of appending "").appendToken
,
killToken()
public void appendToken()
setTag
, set
, or remove
methods, those changes will be reflected.
By default, this method is automatically called after each token is processed unless the user has already appended something to the resultant HTML document. Therefore, if the user appends something and also wants to append the current token, or if the user wants to append the current token a number of times, this method must be called.
append(java.lang.String)
,
killToken()
public void killToken()
HtmlRewriter
not to append the current token
to the resultant HTML document. Even if the user hasn't appended
anything else, the current token will be ignored rather than appended.
append(java.lang.String)
,
killToken()
public boolean accumulate(boolean accumulate)
After each token is processed, the current token is appended to
to the resultant HTML document unless the user has already appended
something else. By setting accumulate
to
false
, this behavior is turned off. The user must then
explicitly call appendToken
to cause the current token
to be appended.
Turning off accumulation takes effect immediately, while turning
on accumulation takes effect on the next token. In other words,
whether the user turns this setting off or on, the current token
will not be added to the resultant HTML document unless the user
explicitly calls appendToken
.
Following is sample code that illustrates how to use this method
to extract the contents of the <head>
of the
source HTML document.
HtmlRewriter hr = new HtmlRewriter(str); // Don't accumulate tokens until we see the <head> below. hr.accumulate(false); while (hr.nextTag()) { if (hr.getTag().equals("head")) { // Start remembering the contents of the HTML document, // not including the <head> tag itself. hr.accumulate(true); } else if (hr.getTag().equals("/head")) { // Return everything accumulated so far. return hr.toString(); } }This method can be called any number of times while processing the source HTML document.
accumulate
- true
to automatically accumulate tokens in the
resultant HTML document, false
to require
that the user explicitly accumulate them.
reset()
public void reset()
public void pushback()
nextToken
is called, it will be the current token again, rather than
advancing to the next token in the source HTML document.
This is useful when a code fragment needs to read an indefinite number of tokens, but that once some distinguished token is found, needs to push that token back so that normal processing can occur on that token.
public int tokenCount()
public int tagCount()
public static String quote(String str)
put
method are automatically quoted as needed. This
method is provided in case the user is dynamically constructing a new
tag to be appended with append
and needs to quote some
arbitrary values.
The quoting algorithm is as follows:
If the string contains double-quotes, put single quotes around it.
If the string contains any "special" characters, put double-quotes
around it.
This algorithm is, of course, insufficient for complicated
strings that include both single and double quotes. In that case,
it is the user's responsibility to escape the special characters
in the string using the HTML special symbols like
"
or "
public StringMap getMap()
|
Version Kenai-svn-r24, Generated 08/18/09 Copyright (c) 2001-2009, Sun Microsystems. |
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |