Incoming ======== This documents the process that occurs when an HTTP request is made to Webware. Most of this process is internal to Webware, but knowing it can help you understand what's going on even if you don't modify any of that code. The First Connection -------------------- The process begins when a browser requests the web server to for a page. The browser tells the server what page it wants to receive, and passes any cookies that are marked for the server and any form variables (GET or POST). Because different web servers are supported, and under each server there several ways to interact with Webware, there are a variety of adapters that will handle the request at this point. With the exception of OneShot, they will generally package up the request and send it over a socket to the AppServer. The AppServer has been started ahead of time, and is waiting to respond. AppServer --------- The AppServer is generally the subclass AsyncThreadedAppServer, however I will refer to it simply as AppServer. The AppServer listens for requests from an Adapter [#]_. When it receives a request, it puts it in a queue and the next available thread will handle the request. A fixed number of threads are started on launch, and if that pool of threads is exausted the request will block until another request has been finished. *(@@ Correct?)* .. [#] note_address_text_ The thread (which is an instance of the RequestHandler class) will wait until it has read all the data (in ``RequestHandler.handle_read``), placing the data in ``RequestHandler.reqdata``. Then the ``RequestHandler.handleRequest`` method is called. The request that was passed over the socket is then unmarshalled [#]_ (if the request was not properly packaged, you will get a marshalling error here -- that is what happens when you try to connect directly to the AppServer from your browser) .. [#] note_marshal_ RequestHandler handles STATUS and QUIT methods directly. *(@@ how would these requests be made?)* All other requests are handled by Application. AppServer keeps an instance of Application, and ``Application.dispatchRawRequest`` is called with the unmarshalled request [#]_. .. [#] note_request_ OneShot ~~~~~~~ When you connect through OneShot.cgi you go through largely the same process, except that there is no persistance. The OneShot adapter starts a new AppServer (OneShotAppServer) for each request. This is inefficient but at times convenient. Application ----------- Creating a Request ~~~~~~~~~~~~~~~~~~ Application.dispatchRawRequest takes the dictionary that was passed over the socket, and creates an HTTPRequest. ``HTTPRequest.__init__`` parses the dictionary. It parses fields, cookies, and some internal values that are used by Application. (For instance, the fields are passed in as a raw, URL-encoded string, but are converted to a dictionary-like object) Creating a Transaction ~~~~~~~~~~~~~~~~~~~~~~ After the HTTPRequest is created, it is passed to ``Application.dispatchRequest``. This creates a Transaction with ``Application.createTransactionForRequest``. Transaction simply acts a container for these various pieces of a transaction (request, response, session, servlet, and application), and passes messages to them (through methods). Transaction is otherwise stateless and has no logic. It is not the parent of these objects -- in particular, the session, servlet, and application will typically outlive the transaction. Creating a Response ~~~~~~~~~~~~~~~~~~~ A response is created with ``Application.createResponseInTransaction``. An HTTPResponse is created, and again Transaction acts as a container. Finding the Servlet file ~~~~~~~~~~~~~~~~~~~~~~~~ The Application asks for ``HTTPRequest.serverSidePath``, which in turn calls ``Application.serverSidePathForRequest``. This then tries to find the Servlet that corresponds to the URI asked for. Consider an example URI:: http://www.server.com/cgi-bin/OneShot_.cgi/Welcome * Application keeps a cache of URLs and their matching files. If a cached filename matches, we use that. Otherwise: * Remove the portion that relates to the adapter (``/cgi-bin/OneShot.cgi``). * Inspect the first portion of the path (``/Welcome``): * Does it match a Context? (Contexts are listed in ``WebKit/Configs/Application.config``) * If so, look in this context. If not, consider it to be in the ``default`` context (the default is defined in ``Application.conf``). In our example it wouldn't match a context, and so we'd treat it as though it was in the default context (``Examples``). * Look in the directory that matches the value of the context entry in ``Application.config``. This directory is considered relative to the location of ``Application.py``, i.e., the ``Webware/WebKit`` directory. In our example, ``Webware/WebKit/Examples`` * Follow the path until you find a file. In our simple example, all that's left of the path is ``/Welcome``. * The file can have any extension *(@@ I'm not really sure how this works)* * If you have ExtraPathInfo set to 1 in ``Application.config``, then anything that is left of the path will be available to your servlet through the method request.extraURLPath() (@@ oh, I can't remember where -- there also appears to be some sort of attempt to match this remaining path information to a file) With the filename of the servlet, Application continues. Dispatching on the Result of serverSidePath ------------------------------------------- When ``Application.dispatchRequest`` gets the resultant serverSidePath, it calls one of a couple methods: * If the result is None, then the page was not found: ``Application.handleBadURL``, which gives a 404 message. * If the result is a directory, but the request didn't end in a slash: ``Application.handleDeficientDirectoryURL``, which gives a redirect to the same location with a "/" appended to the URL. * If the session ID is invalid (doesn't exist or has timed out, ``Application.isSessionIdProblematic``): ``Application.handleInvalidSession``, which creates a new session ID, sets the cookie, and passes to ``Application.handleGoodURL`` * Otherwise (all good): ``Application.handleGoodURL`` Creating a Servlet ------------------ ``Application.handleGoodURL`` calls ``Application.createServletInTransaction``. Like the path lookup, this method first looks for a cached Servlet. If it's found, it checks the timestamp on the cache and the source file, invalidating the cache if necessary. If a cached Servlet wasn't found, or the cache was invalidated, it creates a new cache entry for the Servlet [#]_. .. [#] note_cachedServlet_ ``Application.getServlet`` actually creates the Servlet. The cache actually keeps a queue of available instances of the Servlet, which are reused when possible. *(@@ what's up with the factories here? I know what they do, but not how they get called)* Waking the Transaction ---------------------- Once the Transaction has a Servlet to work with, it calls ``Transaction.awake``, ``Transaction.respond``, and finally ``Transaction.sleep``. Transaction in turns calls these methods on both the Session and the Servlet. ``HTTPServlet.awake`` doesn't do anything, unless you override it in a subclass -- typically you would override it to set up resources and instance variables for the servlet, or to do actions based on the request. ``Session.awake`` sets its list access time and number of accesses when awake is called. Responding ---------- ``HTTPServlet.respond`` is called with the transaction as its only argument. It calls a method based on the request type: 'GET', 'POST', 'PUT', 'DELETE', 'OPTIONS', 'TRACE'. 'GET' calls HTTPServlet.respondToGet, 'POST' respondToPost, etc., all with the transaction as an argument. The actual Servlet must override these methods to give the desired behavior. ``Session.respond`` does nothing. Page ---- Page (a subclass of HTTPServlet) has more interesting behavior. It is particularly directed towards generating HTML (HTTPServlet is entirely content-neutral), and consolidates a number of things. ``Page.awake`` initializes a number of variables that will not change for the entire transaction (but may change if the Page is reused for other transactions). ``Page.respondToGet`` and Page.respondToPost both call ``Page._respond``, which looks for a field named ``'_action_'`` and dispatches based on that. ``'_action_'`` is translated by ``Page.methodNameForAction``, and the result must be among the list returned by ``Page.actions`` (cached by ``Page._actionSet``). If no ``'_action_'`` field is given, ``Page.writeHTML`` is run. Page can also generate the HEAD, TITLE, and other elements of the HTML page. You can override ``Page.writeBody`` or ``Page.writeContent`` to generate content, and methods like ``Page.title`` to generate other content. It's easiest just to look at ``Page.py`` to see these. Writing a page -------------- ``Page.write`` calls ``HTTPResponse.write`` with its arguments. ``HTTPResponse.write`` holds these strings in a list until you are finished with the transaction. You can also stream the output by calling HTTPResponse.flush, which will start sending output directly -- once this has occurred, you can send no more headers (such as cookies, redirects, etc). The Return Path =============== Application ----------- After having set up the request, we need to back out all the way to the browser. After ``Application.dispatchRequest`` has called ``Application.handleGoodURL`` (which calls awake/respond/sleep), it will call ``HTTPResponse.deliver``, which basically marks the response as committed (i.e., nothing more can be added). Then ``Application.returnInstance`` is called, which returns the Servlet instance back to the pool of cached servlets (to be reused for a later request). Response -------- ``RequestHandler.handleRequest`` calls ``HTTPResponse.rawResponse``, which returns a dictionary containing the keys 'headers' and 'contents'. Headers is a list of header/value pairs. For example:: [('Content-Type', 'text/html'), ('Set-Cookie', 'foo=bar') ] ``RequestHandler.handleRequest`` then turns this into a normal CGI-style response, with ``header: value`` at the top, a blank line, and then 'contents'. It then deletes the transaction. Adapter ------- Having waited patiently, RequestHandler will finally send the string contructed from the Response to the Adapter over the socket. The adapter will deal with it as appropriate. E.g., the CGI adapter prints the result to stdout. Finished ======== The user sees the page, and it is good. ------------ .. _note_address_text: The AppServer writes the hostname and port to a file ``address.text``. The Adapter reads this file to determine where it can connect to the AppServer. .. _note_marshal: Marshalling takes simple Python values -- strings, lists, numbers, etc., and puts them into a string representation. .. _note_request: The request is a dictionary with the keys 'format', 'time', 'input', and 'environ': 'format': The only current allowed value for 'format' is 'CGI'. 'time': A timestamp (seconds from the Unix Epoch). 'environ': A dictionary that looks like what os.environ would look like were this actually a CGI call -- that is, with keys like REQUEST_METHOD, QUERY_STRING, etc. 'input': The request that the browser made. This would be something like ``GET /Examples/View?filename=Welcome.py`` *(@@ POST example too?)* .. _note_cachedServlet: The cache for the Servlet is used both for the file path lookup, and for the Servlet cache (i.e., two caches keyed by URL/PATH_INFO and by serverSidePath, but pointing to the same cached data). *(@@ maybe some information on how the cache is stored)*