sunlabs.brazil.util.http
Class HttpRequest

java.lang.Object
  extended by sunlabs.brazil.util.http.HttpRequest (view source)

public class HttpRequest
extends Object

Sends an HTTP request to some target host and gets the answer back. Similar to the URLConnection class.

Caches connections to hosts, and reuses them if possible. Talks HTTP/1.1 to the hosts, in order to keep alive connections as much as possible.

The sequence of events for using an HttpRequest is similar to how URLConnection is used:

  1. A new HttpRequest object is constructed.
  2. The setup parameters are modified:
  3. The host (or proxy) is contacted and the HTTP request is issued:
  4. The response headers and body are examined:
  5. The connection is closed:

In the common case, all the setup parameters are initialized to sensible values and won't need to be modified. Most users will only need to construct a new HttpRequest object and then call getInputStream to read the contents. The rest of the member variables and methods are only needed for advanced behavior.

The HttpRequest class is intended to be a replacement for the URLConnection class. It operates at a lower level and makes fewer decisions on behavior. Some differences between the HttpRequest class and the URLConnection class follow:

A number of the fields in the HttpRequest object are public, by design. Most of the methods mentioned above are convenience methods; the underlying data fields are meant to be accessed for more complicated operations, such as changing the socket factory or accessing the raw HTTP response line. Note however, that the order of the methods described above is important. For instance, the user cannot examine the response headers (by calling getResponseHeader or by examining the variable responseHeaders) without first having connected to the host.

However, if the user wants to modify the default behavior, the HttpRequest uses the value of a number of variables and automatically sets some HTTP headers when sending the request. The user can change these settings up until the time connect is called, as follows:

variable version
By default, the HttpRequest issues HTTP/1.1 requests. The user can set version to change this to HTTP/1.0.
variable method
If method is null (the default), the HttpRequest decides what the HTTP request method should be as follows: If the user has called getOutputStream, then the method will be "POST", otherwise the method will be "GET".
variable proxyHost
If the proxy host is specified, the HTTP request will be sent via the specified proxy: Otherwise, the HTTP request will go directly to the host:
header "Connection" or "Proxy-Connection"
The HttpRequest sets the appropriate connection header to "Keep-Alive" to keep alive the connection to the host or proxy (respectively). By setting the appropriate connection header, the user can control whether the HttpRequest tries to use Keep-Alives.
header "Host"
The HTTP/1.1 protocol requires that the "Host" header be set to the name of the machine being contacted. By default, this is derived from the URL used to construct the HttpRequest, and is set automatically if the user does not set it.
header "Content-Length"
If the user calls getOutputStream and writes some data to it, the "Content-Length" header will be set to the amount of data that has been written at the time that connect is called.

Once all data has been read from the remote host, the underlying socket may be automatically recycled and used again for subsequent requests to the same remote host. If the user is not planning on reading all the data from the remote host, the user should call close to release the socket. Although it happens under the covers, the user should be aware that if an IOException occurs or once data has been read normally from the remote host, close is called automatically. This is to ensure that the minimal number of sockets are left open at any time.

The input stream that getInputStream provides automatically hides whether the remote host is providing HTTP/1.1 "chunked" encoding or regular streaming data. The user can simply read until reaching the end of the input stream, which signifies that all the available data from this request has been read. If reading from a "chunked" source, the data is automatically de-chunked as it is presented to the user. Currently, no access is provided to the underlying raw input stream.


Nested Class Summary
static class HttpRequest.HttpSocket
          This class is used as the bag of information kept about a open, idle socket.
static interface HttpRequest.HttpSocketPool
          This interface represents a cache of idle sockets.
static class HttpRequest.SimpleHttpSocketPool
           
static class HttpRequest.TimeoutException
          Timeout occured waiting for a socket response
 
Field Summary
static String DEFAULT_CHARSET
           
static String defaultHTTPVersion
          The default HTTP version string to send to the remote host when issuing requests.
static boolean displayAllHeaders
          setting this to "true" causing all http headers to be printed on the standard error stream; useful for debugging client/server interactions.
 boolean displayHeaders
           
static int DRAIN_TIMEOUT
          Timeout (in msec) to drain an input stream that has been closed before the entire HTTP response has been read.
 String host
          The host extracted from the URL used to construct this HttpRequest.
static int LINE_LIMIT
          Maximum length of a line in the HTTP response headers (sanity check).
 String method
          The HTTP method, such as "GET", "POST", or "HEAD".
 int port
          The port extracted from the URL used to construct this HttpRequest.
 String proxyHost
          If non-null, sends this HTTP request via the specified proxy host and port.
 int proxyPort
          The proxy port.
 MimeHeaders requestHeaders
          The headers for the HTTP request.
 MimeHeaders responseHeaders
          The headers that were present in the HTTP response.
 MimeHeaders responseTrailers
          An artifact of HTTP/1.1 chunked encoding.
static Vector socketPools
          A list of socketPools.
 String status
          The status line from the HTTP response.
 URL url
          The URL used to construct this HttpRequest.
 String version
          The HTTP version string.
 
Constructor Summary
HttpRequest(String url)
          Deprecated. Use the static getRequest method instead
HttpRequest(URL url)
          Creates a new HttpRequest object that will send an HTTP request to fetch the resource represented by the URL.
 
Method Summary
 int addHeaders(String tokens, Properties props)
          Convenience method for adding request headers by looking them up in a properties object.
static void appendPool(HttpRequest.HttpSocketPool pool)
          Append a socket pool to the end of the list.
 void close()
          Gracefully closes this HTTP request when user is done with it.
 void connect()
          Connect to the target host (or proxy), send the request, and read the response headers.
 void disconnect()
          Interrupts this HTTP request.
 String getContent()
          Return the content as a string.
 String getContent(String encoding)
          Get the content as a string.
 int getContentLength()
          Convenience method to get the "Content-Length" header from the HTTP response.
 String getEncoding()
          Get the ISO character encoding (if any) associated with this text stream, or the default http charset if none found.
static String getEncoding(MimeHeaders headers)
           
 HttpInputStream getInputStream()
          Gets an input stream that can be used to read the body of the HTTP response.
 OutputStream getOutputStream()
          Gets an output stream that can be used for uploading data to the host.
static HttpRequest getRequest(String url)
           
static HttpRequest getRequest(URL url)
          Create a HttpRequest object.
 int getResponseCode()
          Gets the HTTP response status code.
 String getResponseHeader(String key)
          Gets the value associated with the given case-insensitive header name from the HTTP response.
static void main(String[] args)
          Grab http document(s) and save them in the filesystem.
static void prependPool(HttpRequest.HttpSocketPool pool)
          Prepend a socket pool to the end of the list.
static void removePointToPointHeaders(MimeHeaders headers, boolean response)
          Removes all the point-to-point (hop-by-hop) headers from the given mime headers.
 void setMethod(String method)
          Sets the HTTP method to the specified value.
 void setProxy(String proxyHost, int proxyPort)
          Sets the proxy for this request.
 void setRequestHeader(String key, String value)
          Sets a request header in the HTTP request that will be issued.
 void setTimeout(int sec)
          Set the timeout for getting a remote response.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DRAIN_TIMEOUT

public static int DRAIN_TIMEOUT
Timeout (in msec) to drain an input stream that has been closed before the entire HTTP response has been read.

If the user closes the HttpRequest before reading all of the data, but the remote host has agreed to keep this socket alive, we need to read and discard the rest of the response before issuing a new request. If it takes longer than DRAIN_TIMEOUT to read and discard the data, we will just forcefully close the connection to the remote host rather than waiting to read any more.

Default value is 10000.


LINE_LIMIT

public static int LINE_LIMIT
Maximum length of a line in the HTTP response headers (sanity check).

If an HTTP response line is longer than this, the response is considered to be malformed.

Default value is 2000.


defaultHTTPVersion

public static String defaultHTTPVersion
The default HTTP version string to send to the remote host when issuing requests.

The default value can be overridden on a per-request basis by setting the version instance variable.

Default value is "HTTP/1.1".

See Also:
version

socketPools

public static Vector socketPools
A list of socketPools. Each socket pool is responible for managing connections for a particular type of transport. The default pool handles standard TCP sockets. Additional transport providers may me added (see appendPool and prependPool below.


url

public URL url
The URL used to construct this HttpRequest.


host

public String host
The host extracted from the URL used to construct this HttpRequest.

See Also:
url

port

public int port
The port extracted from the URL used to construct this HttpRequest.

See Also:
url

proxyHost

public String proxyHost
If non-null, sends this HTTP request via the specified proxy host and port. May be changed by the user at any time up until the HTTP request is actually sent.

See Also:
proxyPort, setProxy(java.lang.String, int), connect()

proxyPort

public int proxyPort
The proxy port.

See Also:
proxyHost

method

public String method
The HTTP method, such as "GET", "POST", or "HEAD".

May be set by the user at any time up until the HTTP request is actually sent.


version

public String version
The HTTP version string.

Initialized from defaultHTTPVersion, but may be changed by the user at any time up until the HTTP request is actually sent.


requestHeaders

public MimeHeaders requestHeaders
The headers for the HTTP request. All of these headers will be sent when the connection is actually made.


displayAllHeaders

public static boolean displayAllHeaders
setting this to "true" causing all http headers to be printed on the standard error stream; useful for debugging client/server interactions.


displayHeaders

public boolean displayHeaders

status

public String status
The status line from the HTTP response. This field is not valid until after connect has been called and the HTTP response has been read.


responseHeaders

public MimeHeaders responseHeaders
The headers that were present in the HTTP response. This field is not valid until after connect has been called and the HTTP response has been read.


responseTrailers

public MimeHeaders responseTrailers
An artifact of HTTP/1.1 chunked encoding. At the end of an HTTP/1.1 chunked response, there may be more MimeHeaders. It is only possible to access these MimeHeaders after all the data from the input stream returned by getInputStream has been read. At that point, this field will automatically be initialized to the set of any headers that were found. If not reading from an HTTP/1.1 chunked source, then this field is irrelevant and will remain null.


DEFAULT_CHARSET

public static final String DEFAULT_CHARSET
See Also:
Constant Field Values
Constructor Detail

HttpRequest

public HttpRequest(URL url)
Creates a new HttpRequest object that will send an HTTP request to fetch the resource represented by the URL.

The host specified by the URL is not contacted at this time.

Parameters:
url - A fully qualified "http:" URL.
Throws:
IllegalArgumentException - if url is not an "http:" URL.

HttpRequest

public HttpRequest(String url)
Deprecated. Use the static getRequest method instead

Creates a new HttpRequest object that will send an HTTP request to fetch the resource represented by the URL.

The host specified by the URL is not contacted at this time.

Parameters:
url - A string representing a fully qualified "http:" URL.
Throws:
IllegalArgumentException - if url is not a well-formed "http:" URL.
Method Detail

getRequest

public static HttpRequest getRequest(URL url)
Create a HttpRequest object.

Parameters:
url - The url to request.

getRequest

public static HttpRequest getRequest(String url)

setMethod

public void setMethod(String method)
Sets the HTTP method to the specified value. Some of the normal HTTP methods are "GET", "POST", "HEAD", "PUT", "DELETE", but the user can set the method to any value desired.

If this method is called, it must be called before connect is called. Otherwise it will have no effect.

Parameters:
method - The string for the HTTP method, or null to allow this HttpRequest to pick the method for itself.

setProxy

public void setProxy(String proxyHost,
                     int proxyPort)
Sets the proxy for this request. The HTTP proxy request will be sent to the specified proxy host.

If this method is called, it must be called before connect is called. Otherwise it will have no effect.

Parameters:
proxyHost - The proxy that will handle the request, or null to not use a proxy.
proxyPort - The port on the proxy, for the proxy request. Ignored if proxyHost is null.

setRequestHeader

public void setRequestHeader(String key,
                             String value)
Sets a request header in the HTTP request that will be issued. In order to do fancier things like appending a value to an existing request header, the user may directly access the requestHeaders variable.

If this method is called, it must be called before connect is called. Otherwise it will have no effect.

Parameters:
key - The header name.
value - The value for the request header.
See Also:
requestHeaders

getOutputStream

public OutputStream getOutputStream()
                             throws IOException
Gets an output stream that can be used for uploading data to the host.

If this method is called, it must be called before connect is called. Otherwise it will have no effect.

Currently the implementation is not as good as it could be. The user should avoid uploading huge amounts of data, for some definition of huge.

Throws:
IOException

connect

public void connect()
             throws UnknownHostException,
                    IOException
Connect to the target host (or proxy), send the request, and read the response headers. Any setup routines must be called before the call to this method, and routines to examine the result must be called after this method.

Throws:
UnknownHostException - if the target host (or proxy) could not be contacted.
IOException - if there is a problem writing the HTTP request or reading the HTTP response headers.

setTimeout

public void setTimeout(int sec)
Set the timeout for getting a remote response. If the origin server hasn't responded with at least the response headers in this time, terminate the request. The timeout may be up until a call to connect(). A value of '0' turns off the timeout.

If a timeout occurs, a TimeoutException is thrown.

Parameters:
sec - timeout, in seconds.

getInputStream

public HttpInputStream getInputStream()
                               throws IOException
Gets an input stream that can be used to read the body of the HTTP response. Unlike the other convenience methods for accessing the HTTP response, this one automatically connects to the target host if not already connected.

The input stream that getInputStream provides automatically hides the differences between "Content-Length", no "Content-Length", and "chunked" for HTTP/1.0 and HTTP/1.1 responses. In all cases, the user can simply read until reaching the end of the input stream, which signifies that all the available data from this request has been read. (If reading from a "chunked" source, the data is automatically de-chunked as it is presented to the user. There is no way to access the raw underlying stream that contains the HTTP/1.1 chunking packets.)

Throws:
IOException - if there is problem connecting to the target.
See Also:
connect()

close

public void close()
Gracefully closes this HTTP request when user is done with it.

The user can either call this method or close on the input stream obtained from the getInputStream method -- the results are the same.

When all the response data is read from the input stream, the input stream is automatically closed (recycled). If the user is not going to read all the response data from input stream, the user must call close to release the resources associated with the open request. Otherwise the program may consume all available sockets, waiting forever for the user to finish reading.

Note that the input stream is automatically closed if the input stream throws an exception while reading.

In order to interrupt a pending I/O operation in another thread (for example, to stop a request that is taking too long), the user should call disconnect or interrupt the blocked thread. The user should not call close in this case because close will not interrupt the pending I/O operation.

Closing the request multiple times is allowed.

In order to make sure that open sockets are not left lying around the user should use code similar to the following:

 OutputStream out = ...
 HttpRequest http = HttpRequest.getRequest("http://bob.com/index.html");
 try {
     HttpInputStream in = http.getInputStream();
     in.copyTo(out);
 } finally {
     // Copying to "out" could have failed.  Close "http" in case
     // not all the data has been read from it yet.
     http.close();
 }
 


disconnect

public void disconnect()
Interrupts this HTTP request. Can be used to halt an in-progress HTTP request from another thread, by causing it to throw an InterruptedIOException during the connect or while reading from the input stream, depending upon what state this HTTP request is in when it is disconnected.

See Also:
close()

getResponseCode

public int getResponseCode()
Gets the HTTP response status code. From responses like:
 HTTP/1.0 200 OK
 HTTP/1.0 401 Unauthorized
 
this method extracts the integers 200 and 401 respectively. Returns -1 if the response status code was malformed.

If this method is called, it must be called after connect has been called. Otherwise the information is not yet available and this method will return -1.

For advanced features, the user can directly access the status variable.

Returns:
The integer status code from the HTTP response.
See Also:
connect(), status

getResponseHeader

public String getResponseHeader(String key)
Gets the value associated with the given case-insensitive header name from the HTTP response.

If this method is called, it must be called after connect has been called. Otherwise the information is not available and this method will return null.

For advanced features, such as enumerating over all response headers, the user should directly access the responseHeaders variable.

Parameters:
key - The case-insensitive name of the response header.
Returns:
The value associated with the given name, or null if there is no such header in the response.
See Also:
connect(), responseHeaders

getContentLength

public int getContentLength()
Convenience method to get the "Content-Length" header from the HTTP response.

If this method is called, it must be called after connect has been called. Otherwise the information is not available and this method will return -1.

Returns:
The content length specified in the response headers, or -1 if the length was not specified or malformed (not a number).
See Also:
connect(), getResponseHeader(java.lang.String)

removePointToPointHeaders

public static void removePointToPointHeaders(MimeHeaders headers,
                                             boolean response)
Removes all the point-to-point (hop-by-hop) headers from the given mime headers.

Parameters:
headers - The mime headers to be modified.
response - true to remove the point-to-point response headers, false to remove the point-to-point request headers.
See Also:
RFC 2068

addHeaders

public int addHeaders(String tokens,
                      Properties props)
Convenience method for adding request headers by looking them up in a properties object.

Parameters:
tokens - a white space delimited set of tokens that refer to headers that will be added to the HTTP request.
props - Keys of the form [token].name and [token].value are used to lookup additional HTTP headers to be added to the request.
Returns:
The number of headers added to the request
See Also:
setRequestHeader(java.lang.String, java.lang.String)

getContent

public String getContent(String encoding)
                  throws IOException,
                         UnsupportedEncodingException
Get the content as a string. Uses the character encoding specified in "encoding", if specified, or the encoding implied by the http headers.

Parameters:
encoding - The ISO character encoding to use.
Returns:
The content as a string.
Throws:
IOException
UnsupportedEncodingException

getContent

public String getContent()
                  throws IOException,
                         UnsupportedEncodingException
Return the content as a string.

Throws:
IOException
UnsupportedEncodingException

getEncoding

public String getEncoding()
Get the ISO character encoding (if any) associated with this text stream, or the default http charset if none found. Response headers must be available.


getEncoding

public static String getEncoding(MimeHeaders headers)

main

public static void main(String[] args)
                 throws Exception
Grab http document(s) and save them in the filesystem. This is a simple batch HTTP url fetcher. Usage:
 java ... sunlabs.brazil.request.HttpRequest [-v(erbose)] [-h(headers)] [-p] url...
 
-v
Verbose. Print the target URL and destination file on stderr
-h
Print all the HTTP headers on stderr
-phttp://proxyhost:port
The following url's are to be fetched via a proxy.
The options and url's may be given in any order. Use "-p" by itself to disable the proxy for all following requests.

There are many limitations: only HTTP GET requests are supported, the output filename is derived autmatically from the URL and can't be overridden, if a destination file already exists, it is overwritten.

Throws:
Exception

appendPool

public static void appendPool(HttpRequest.HttpSocketPool pool)
Append a socket pool to the end of the list. This pool is checked last for a matching URL


prependPool

public static void prependPool(HttpRequest.HttpSocketPool pool)
Prepend a socket pool to the end of the list. This pool is checked first for a matching URL


Version Kenai-svn-r24, Generated 08/18/09
Copyright (c) 2001-2009, Sun Microsystems.