10 Dec 2002. The DELTA IM Module for SRE2003 Abstract: SRE2003's DELTA "instance manipulation" module provides support for delta encoding. -------------------------------------------- Contents: 1) Introduction 2) Installing and Configuring the DELTA IM module 3) Technical Notes -------------------------------------------- 1) Introduction One of the most significant advances of the http/1.1 standard is increased support for caching. Caching is the act of using a a locally available version of a resource instead of re-obtaining the resource from its originating server. To the extent that caching can be enhanced, overall internet traffic will be reduced, with concomittant increase in delivery speeds for what remains. Unfortunately, the improved (but still relatively simple) caching schemes supported by http/1.1 are not well suited to dynamic content. Since a large (and probably growing) share of web resources are dynamic (that is, they change from day to day), this weakness may seriouslly undermine the potential advantages of caching. One strategy of dealing with this problem is through the use of "deltas". Current cache schemes require a server to instruct a client to either use a cached item as is, or not at all. Delta caching schemes represent a compromise -- the server can tell a client to use its cached version as a "base", and send a list of "differences". This list of differences, which we also refer to as "deltas", may often be much smaller then the full contents of the resource. Technically speaking ... If the server and client can ascertain that both have the same copy of a prior instance of a request-URI, then the server can compute the difference between this "commonly owned" prior instance and the current instance. One can think of this difference as being an "encoding" of the current instance, in much the same way that GZIP compression is an encoding. Upon reciept of this "difference", the client can create a duplicate of the new version by combining the "difference" with his "prior version" of the instance. Note: SRE2003's support for delta encoding closely follows RFC 3229 (http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc3229.html). -------------------------------------------- 2) Installing and Configuring the DELTA IM module The DELTA IM module (DELTA.RXX) is packaged in DELTA.ZIP. DELTA.ZIP also which includes several support programs (and this documentation). The INSTALL.CMD program included in DELTA.ZIP should be used to install DELTA. Before installing DELTA, you need the following: 1) an HFPS drive for storing old instances (using long filenames). This should have at least 50M free. Ideally, this will be the same drive as SRE2003 is installed on. If not, you'll need to modify a few parameters in DELTA.RXX 2) The EMX (ver 0.9d) "runtime environment". EMX ver 0.9d can be found at hobbes -- try http://hobbes.nmsu.edu/pub/os2/dev/emx/v0.9d/emxrt.zip or, you can get it from: http://www.srehttp.org/pubfiles/emxrt_09d_04.zip Installing EMX 0.9d is fairly easy: a) copy the .ZIP file to the root of your boot drive (say, C:\) b) Unzip this file -- make sure you use an UNZIP that will automatically create sub directories. * An EMX directory (with several subdirectories and about 50 files) will be created. * There should be nothing copied to the root directory. c) Modify the following in CONFIG.SYS: i) Add C:\EMX\DLL to your LIBPATH= line. ii) Add C:\EMX\BIN to your PATH= line. (of course, the c:\emx should be changed to reflect where you installed EMX). d) Reboot your computer. EMXREV is a short program (that comes with EMX) that will report on what version of EMX you've installed -- you can run it (after rebooting) to check that the installation was successful. For example: c:>emxrev should return something like: EMX : revision = 61 EMXIO : revision = 60 EMXLIBC : revision = 63 EMXLIBCM : revision = 64 EMXLIBCS : revision = 64 EMXWRAP : revision = 60 Given these prerequisites, you can install DELTA: 1) Unzip DELTA.ZIP to an empty temporary directory 2) If SRE2003 is running, shut it down. 3) From an OS/2 command prompt, go to this empty temporary directory and run INSTALL. 4) You will be asked to provide the name of the SRE2003 "working" directory (where SRE2003 is installed). You will also be asked if you want DELTA to be the default IM. 5) The installation program will create a few subdirectories, copy a few files, and will modify the SRE2003.CFG configuration file. 6) Restart SRE2003 -- DELTA is now installed! Notes: * DELTA uses several GNU programs -- see the COPYING.GNU file in x:\sre2003\BIN\IM\DELTA for the use license (where x:\sre2003 is where you've installed SRE2003). You can get the full version of these programs at: http://archiv.leo.org/pub/comp/os/os2/leo/gnu/systools/ (look for the GNUED and GNUDIFF entries). * DELTA also uses GDIFF.DLL. For further details on GDIFF, see http://www.srehttp.org/apps/gdiff * DoGET.CMD, that comes with the DELTA module, has a "delta encoding" option enabled. This instructs DOGET to include the necessary headers (such as A-IM and If-None-Match) in all requests. More importantly, DOGET will store responses (instances) to a "deltacache" subdirectory. These cached instances will then be used to automatically undo instance manipulations, and thereby regenerate the current "undifferenced" instance. Note that DOGET's "instance manipulation" awareness is good for any rfc3229 compliant server, not just SRE2003. * CLEANUP.CMD is installed to your x:\sre2003\BIN\IM\DELTA subdirectory. This is a standalone utility that will cleanup the delta-cache directory, removing least-recently used files. To run CLEANUP.CMD: i) From an OS/2 prompt, CD to x:\sre2003\BIN\IM\DELTA, where x:\sre2003 is the SRE2003 directory. ii) run CLEANUP.CMD There are no run-time parameters to set. However, there are a few parameters that you can modify in the user-configurable parameters section of CLEANUP.CMD. -------------------------------------------- 3) Technical Notes i) The delta-encoding standard: RFC3229 ii) Description of IM Modules: IM_USE.HTM iii) The 226 IM Used response code iv) GDIFF and DIFF-E are the currently supported differencing algorithims v) Partial support for multiple range requests vi) Cases where DELTA will not be attempted vii) A-IM conditions that are ignored viii) Using RETAIN to specify how long to retain an instance ix) Delta not used if it's longer then the original x) Costs to using delta, and when it may not be worth it xi) Dynamic resources and creating/storing seperate instances for all requests xii) Issues in specifying etags for dynamic resources that change infrequently xiii) Using LIFESPAN, MAX_BYTES, and MAX_MIN_FREE to specify retention time xiv) Delta not attempted if no etag specified xv) What to do if the base and current content-codings differ xvi) The delta storage directory xvii) Structure of instance files stored in the delta storage directory ---------------------------------------------- i) RFC3329.TXT (which is installed to x:\sre2003\bin\im\delta) is the Delta-Encoding draft standard. Of course we recommend reading it! II) IM_USE.HTM (in x:\sre2003\docs) contains more details on installing IM Modules. IM_USE.HTM also discusses the concept of "instance manipulation", a concept that is key to the operation of delta encoding. iii) Note that if delta-encoding is done, then the response code is: HTTP/1.1 226 IM Used In addition, IM and Delta-Base headers are added; and a retain and im tokens may be added to a Cache-Control header. iv) Currently, the DELTA IM module supports the GDIFF and DIFF-E "delta encoding types". DIFF software is readily available, and free; a free OS/2 version of GDIFF can be found at http://www.srehttp.org/apps/gdiff. DIFF does NOT work with binary files, but tends to be better then GDIFF for non-binary (text) files. The delta encoding spec mentions VCDIFF. If OS/2 implementations of these (or other common) differencing algorithims become available, we will be happy to add support (code donations are gratefully accepted!) v) At this writing, delta is not fully supported for multiple/byte ranges. More precisely, multiple differences (over multiple byte ranges) are not attempted; but multiple byte ranges over a single difference is supported. Technically speaking, this is NOT supported (where xdiff is either DIFFE or GDIFF): Range: bytes=1000-2000,3000-4000 A-IM: range,xdiff but this IS supported Range: bytes=1000-2000,3000-4000 A-IM: xdiff,range vi) There are several cases where DELTA will not be attempted, even though an A-IM and If-None-Match request headers are provided. 1) If A-IM does not contain a delta-token (the spec insists on this) 2) In A-IM, if a GZIP appears before a delta-token (it's fruitless to apply differencing to gzip'ped files) 3) If the If-None-Match does not list a currently available instance. As a subset of this, if If-None-Match is empty, DELTA is not attempted. 4) there is no no Host: request hedaer (or a host is not specified in the request-line) Note that the "delta tokens" are GDIFF and DIFFE (case insensitive) vii) Certain semi-pathological A-IM conditions are "liberally" ignored: 1) If A-IM contains multiple delta-tokens, they must be adjacent. If not (if one or more non-delta token sit between delta-tokens), the later delta-tokens are ignored. 2) Only one RANGE token can be in A-IM. Later RANGE tokens are ignored. 3) Only one GZIP token can be in A-IM. Later GZIP tokens are ignored viii) You can specify how long SRE2003 should retain this instance (for use in future delta request) by using one of: > Retain for nseconds seconds astat=SRE_COMMAND('IMFINFO ','Retain: nseconds') > Do NOT retain this instance astat=SRE_COMMAND('IMFINFO ','Retain: 0') > Retain this instance indefinitely (given the LIFESPAN and other parameters). astat=SRE_COMMAND('IMFINFO ','Retain: ') This is the default. ix) If the best delta is longer then the current instance, a delta response will NOT be sent. x) Delta encoding is a grand idea, but it comes at some cost to your server -- the time required to compute differences, and the space (and time) required to store "prior instances". In many cases, it's not worth it -- such as for resource that are rarely re-requested, for resources that rarely change, or for small resources that contain many possible changes. Thus, on many sites it may be sensible to NOT choose DELTA to be your "default IM". If this is so, then you'll need some other mean s of enabling delta encoding. Basically, this is done by using the IM option of SRE_COMMAND('FILE ...') and SRE_COMMAND('VAR ...'). For example: astat=sre_command('FILE type text/plain im delta name d:\www\plan9.txt') See SRE2PRC.HTM for the details. Note that the SREhttp/2 "filter" will offer configuration options that allow you to easily enable/disable DELTA on a request specific basis. xi) Keep in mind that for dynamic resources (say, a web page that contains a hit counter, or a clock), every single request results in a different instance, so that each request results in the server retaining a copy of the response. Future versions of the DELTA IM module will support DCLUSTER and DTEMPLATE extensions to the delta-encoding standard. This extension (that is still being written) allows a server to specify instances that will be used for a variety of request-URIS, thereby simplifying the task of maintaining a useful cache of base instances. xii) Some dynamic resources are generated on the fly, but have contents that change infrequently. For example, HTML documents with server-side includes may be rebuilt on every request, but may have characters that change infrequently (say, the only server side include is a date field). For such resources, the default SRE2003 "auto-etag" option will create a new etag for each request. This is unfortunate, since the server must now store seperate instances for each request, even though the contents of these instances are identical. To get around this problem, you should: i) use the ETAG_AUTO2 option,in SRE_COMMAND('FILE and SRE_COMMAND('VAR. ii) Set the ENTITY_HEADERS_IGNORE to include LAST-MODIFIED and EXPIRES Note that sreLite2, using the default values of ENTITY_HEADERS_IGNORE, uses these two options. xiii) The LIFESPAN, MAX_BYTES, and MAX_MIN_FREE parameters (in DELTA.RXX) control the retention (and deletion) of old-instances. Basically, a LRU algorithim is used to remove instances that have not been used recently; with this removal dictated by age and by how full your disk is. xiv) Delta encoding will NOT be attempted if the response does not include an ETAG response header (as provided by the filter), or if there is Note that all of the SRE2003 filters (simple, sreLite2, and SREhttp/2) include ETAGS in all "normal" requests (requests for static files). xv) If an otherwise useable base-instance has a different content-encoding then the current instance, it will not be used. There is one exception to this rule: > if the only difference, between a base instance content-encoding and the current instance's content-encoding, is a trailing GZIP in the base instance, > then the base-instance will be unGZIPped, > and this unGZIPped variant will be used as the base-instance when performing instance-manipulations For example: Current instance Content-Encoding: x,y,z Base instance Content-Encoding: x,y,z,GZIP then use an unGZIPped version of the base instance. xvi) DELTA uses a "delta_dir" to store old instances. By default, this will be the x:\sre2003\TEMP\DELTA directory (where x:\sre2003 is where you installed SRE2003). At any time you can delete the contents of thie directory with no deleterious effect -- other then suppressing delta encoding against these instances. Note that this does not effect the storage (and use) of future instances. xvii) Files in the "delta_dir" have the following structure: a1_a2.b.c where and c : a hash of the servername b : a hash of the request string a2 : _a2 is used only part of the time. It is a hash of the contents of the instance a1 : a hash combining entity headers and date & size information The hashs are derived from crc of the respecttive items, and consist of between 6 and 9 upper case alphanumerics.