14 Nov 2003. Ver 1.19 Using the SRE2003 Cache-Data Daemon 1) Introduction 2) The SRE_CACHE_DATA procedure 2.1) SRE_CACHE_DATA ACTIONS 3) The 7 character "special identifiers" 4) Hint on saving non-text strings, or strings with tabs and CRLFS 5) File structure --------------------- 1) Introduction The SRE2003 data-cache daemon provides permanent storage of small to moderate sized collections of small data records. It's strengths are relatively quick lookup and recording of data, using a caching mechanism to reduce disk access. Basically, this data-caching daemon is used by: 1) defining a data series. You can define where to store the data files. Later, you can ... 2) open a data series (usually one created at an earlier date) 3a) add records (in an opened data series). Each record can contain several string variables (you can store numbers as strings). 3b) look up variables in previously defined records, 3c) modify records (changing variable values) 4) closing the series Although fairly complete, this daemon is not designed to be efficient under complicated changes. For example: a) It is best used with relatively small records, and with variables whose size does not change. b) Although record removal is relatively simple, space is only recovered when explicitly instructed. Moreover, c) it is not designed to work well with large datasets (say, over 100,000 records). d) it has limited error recovery, hence should not be used with data for which guaranteed accuracy, and protection from accidental erasure, is crucial Features: * You can specify several different data series, each associated with a different permanent data file. * The data series files can be viewed, BUT NOT EDITED, with a text viewer. * Records can be identified by an arbitrarily long text string, or by a 7 character identifier. Usage: Although you can use SRE2003 daemon procedures to communicate with the cache-data daemon, it is far more convenient to use the SRE_CACHE_DATA procedure. Hence, we now describe SRE_CACHE_DATA... --------------------- 2) The SRE_CACHE_DATA procedure: Syntax: aresult=sre_cache_date(series,action [options],identifier,val1,..,val10) where series: a data series. action: what to do. The following actions are available: DEFINE : create a data series. OPEN : open a currently existing data series ADD : add a new record MODIFY : modify a record EXTRACT : Extract one or several records, by record number FIND : find a record, or records REMOVE : remove a record CLOSE : close the data series (save cached information) INFO : info about a data series SAVE : save any cached information CLEANUP : cleanup data REWRITE : rewrite the data file (fix odd records, remove old junk) DELETE : delete a data series ERROR : return an error message, given an error code [options] Space delimited list of optional action modifiers. identifier : an identifier val1 .. val10 : up to 10 values, whose used depends on action aresult: depends on action. A note on action modifiers: The ADD, MODIFY, FIND, and REMOVE actions all can take the ID7 and WRITE action modifiers: ID7 When an ID7 action modifier is specified, then the IDENTIFIER must be a "special" 7 character identifier. The special identifiers are described in the 7 character "special identifiers" section below. WRITE When WRITE is used an action modifier, then the changes are immediately written to the data series file. Actually, all outstanding (not yet recorded) changes, for any open data series, are written to their respective data series files. Basically, this does the same thing as using a SAVE action after the ADD, etc. action. 2.1) SRE_CACHE_DATA ACTIONS The following describes the usage of SRE_CACHE_DATA for each ACTION DEFINE: define a data series Defining a data series. SERIES : series name ACTION: DEFINE IDENTIFER: Optional fully qualified file name VAL1: Optional space delimited list of variable names VAl2: Optional 60 char or shorter DESCRIPTION VAL3: Optional "header" comment (can be arbitrarily long) aresult: 0 for success, otherwise an error code When you DEFINE a data series, a new data file will be created. Unless otherwise specified, a "SRE2003 Data Series" file will be created in your x:\sre2003\DATA directory. Its name will be the SERIES variable, with a .SDS extension. To specify a file, set the IDENTIFIER variable to a fully qualified file name (if you don't give an extension, .SDS is appended). Notes: * SERIES names are case insensitive, and can contain A-Z,0-9,$, and _ Their maximum length is 40 characters * If the drive you are using is FAT -- then the data series name can only be 8 characters long * If you use IDENTIFIER to assign a non-default filename, you'll have to provide this filename whenever you OPEN the data series Error values: The error values are: 1 Could not open a data series file: file_name 2 Data series name is too long (>8 character on a FAT drive) 3 A variable name was too long 4 A variable name had unallowed characters 5 A series name had unallowed characters or is too long 6 A data series with this name is already DEFINEd (you can DELETE, and then DEFINE again) A note on variable names: The nth variable name (in the VAL1 argument of a DEFINE action) is associated with the Val_nth argument (in an ADD or similar action). For example: astat=sre_cache_data('SERIES1','DEFINE',,'CATS DOGS GORILLAS') and later ... astat=sre_cache_data('SERIES1','OPEN') astat=sre_cache_data('SERIES1','ADD','AF1','FELIX','Fido','MAGILLA') astat=sre_cache_data('SERIES1','ADD','Gn1','FRITZ','SCOOBY DOO','K Kong') will result in the creation of 2 records: Id=AF1 which will have the variables and values: CATS = FELIX DOGS = Fido GORILLAS = MAGILLA Id=Gn1 (note that ID variables can be case-sensitive) CATS = FRITZ DOGS = SCOOBY DOO GORILLAS = K Kong More Variable Name Notes: * after DEFINEing a data series, you have to OPEN it to work with it (say, to add records) * variable names can be up to 40 characters long; but shorter is better. * variable names are case insensitive, and can contain A-Z,0-9,$, and _ They can NOT contain spaces! * the default variable names are %VAR_1,..,%VAR_10 (note that % is NOT allowed in user defined variable names) * Extending the above example: astat=sre_cache_data('SERIES1','ADD','Z2BC2', , 'Garfield','Snoopy','Joe Young','Mothra') Id=Z2BC2 CATS = Garfield DOGS = Snoopy GORILLAS = Joe Young %VAR_4 = Mothra OPEN: open a data series Open a previously defined data series SERIES: Series name, or blank ACTION: Open Options: INDEX -- use a 'quick-load-index' if it exists. or save some info to quick-load-index (if changes detected) IDENTIFIER: if DEFINEd with a file name, you MUST include it here Otherwise, identifier MUST NOT be used. VAL1 : name of 'quick-load-index'. If not specified, a default name is used. If INDEX is not an ACTION option, VAL1 is ignored. VAL2,.. : not used aresult: 0 for success, otherwise an error code and message Error values: The error values are: 1 Problem opening data series file 2 Bad header information in data series file (perhaps it's not a data series file) 3 Too many records 4 Not enough records 5 Improper series name 6 Data series file does not exist 7 Data series has already been opened (and is still open) Notes: * when using OPEN and an explicit data series file, you do NOT need to use the same series name as the data series was DEFINEd by. * if you leave the SERIES name blank, and use an explicit data series file, the name it was defined by will be used. * The quick-load index is ONLY used for speeding up "opening" the data series. It has no effect on read or writing to the dataseries. Examples: aa=sre_cache_data('MYSERIES','OPEN') aa=sre_cache_data('MYSERIES2','OPEN INDEX') aa=sre_cache_data('HERSERIES','OPEN','F:\HERDATA\COUNTS.CNT') aa=sre_cache_data(,'OPEN','F:\HISDATA\XCOUNTS.CNT') aa=sre_cache_data('HERSERIES','OPEN INDEX','F:\HERDATA\COUNTS.CNT','F:\HERDATA\COUNTS.CNI') ADD: Add new record to an open data series SERIES: series name ACTION: ADD Optional action modifiers: COMMENT, WRITE and ID7 Comment means "add this as a comment". IDENTIFIER should contain the actual comment, which can be arbitraritly long. VAL1..VAL10 are ignored. The comment is written at the end of the data file. IDENTIFIER: either a string, or a special identifier (if ID7 specified) VAL1,..,VAL10: Values. These can be text strings of any length. Note that TAB, CR, and LF characters are converted to spaces, and trailing & leading spaces are stripped. aresult: A 7 character identifier, or a 1 character error code If you explicitly defined a 7 character ID (using ID7), this will be the same value (with upper case translation). Otherwise, a 7 character identifier will automatically be generated. A note on defining values: We strongly recommend allocating as much space to these values as may be needed during future modifications. This may mean using dummy values; such as 'abcde' to set aside 5 bytes for a variable. A note on identifiers: If you are using SRE_CACHE_DATA to store many unique data records, you SHOULD use unique identifers for each record. See section 3 for further details. Error values: The error codes are: 1 = Data series was not opened 2 = Too many of this "family" of identifier. See Section 3 for details. 3 = Explicitly defined identifer does not have 7 characters 4 = Explicitly defined identifier has an unallowed character 6 = Record with this id already exists 7 = Problem adding to data series file 8 = Bad data series name 9 = ADD specified with VAR or ID7 Examples: aa=sre_cache_data('MYSERIES','ADD','/HOME/Index.1','1st house ','2nd dog') aa=sre_cache_data('MYSERIES','ADD ID7','CXAQQE2','peanuts','cracker jacks') MODIFY: Modify a record, or modify a portion of a record Change some or all of the variables in a record. SERIES: series name ACTION: MODIFY Optional action modifiers: VAR, INCREMENT, ADD, WRITE, *, and ID7 If the action modifiers contains VAR (VAR-mode): Then change the value of the selected variable If the action modifies contains INCREMENT (INC-mode) Then increment the value of the selected variable Otherwise (ALL-mode) Change all the variables The ADD options means "if this record does not exist, create it". The * option means "if no exact match, try looking for a wildcard match". If no wildcard match, and ADD option is specified, then create a new record. Conversely, if there is a wildcard match the ADD option is ignored. The ADD and * options can NOT be used with ID7. IDENTIFIER: either a string, or a special identifier VAL1,..,VAL10: Values. Depends on "mode". If VAR-mode: VAL1 = variable name VAL2 = new value If INC-mode VAL1 = variable name VAL2 = value to add (negative values allowed) If val2 is not specified, a value of 1 is used. If ALL-mode: VAL1 .. VAL10 can be strings of any length. aresult: If successful, and ALL-mode or VAR-mode: ID7. If successful, and INC-mode: ID7' 'prior_count' 'most_recent_access where most_recent_access is in "JS" (nnnn.fffffff) date/fraction-of-day format. Otherwise, a 1 character error code. The error codes are: 1 = Data series was not defined 2 = Both INCREMENT and VAR specified 3 = Variable not found 4 = Bad ID7 name 5 = Record not found 6 = Attempting to increment with non-numeric value 7 = Attempting to increment non-numeric variable 8 = Bad data series name 9 = ADD specified with ID7 A note on defining values: We recommend not modifying a variable with a value longer then what it was defined with. It will work, but it often requires creating a new record (and marking the old one for removal) -- a set of actions that is somewhat time consuming. Examples: aa=sre_cache_data('MYSERIES','MODIFY WRITE','/HOME/Index.1','1st cat ','2nd mouse') aa=sre_cache_data('MYSERIES','MODIFY ID7 VAR','CXAQQE2','%VAR_1','coated popcorn') EXTRACT: Extract one or many records -- extract either all the values for the desired records, or the identifiers SERIES: series name ACTION: EXTRACT Optional action modifiers: ID ID7 ABSTRACT=xxx [or ABSTRACT=xxx&yyy] If ID, ID7, or ABSTACT=xxx are used, then just return the ID7, the ID, and/or the contents of the xxx variable IDENTIFIER: a record number, or a range VAL1..VAL10: not used aresult: If success: 0 #RECORDS RECORD1 ... RECORD_#RECORDS If falure 1_digit_errorcode The error codes are: 1 = Data series is not open 2 = Bad range (misordered) 3 = Bad range (smallest<1, or largest>#records in series) 5 = Record could not be read 8 = Bad data series name Notes: * If success, the structure of the returned records depend on the value of VAL1 > If no modifiers are specified: ID7 , id '09'x value1 '09'x ... value Note that: ... the comma immediately follows the ID7 ... all values will be on a single line (since records can NOT contain crlfs) > If Modifier contains "ID", each line contains the identifying record > If Modifier contains "ID7", each line contains the "ID7" identifier for the record > If Modifier contains both "ID" and "ID7", each line contains both the ID7 and ID (seperated by a space, with ID7 first) > If Modifier contains ABSTRACT=xxx, then the value of the xxx variable (for each record) will be returned after the ID7 and/or ID (preceded by a '09'x) Example: URSECBB JonesJ '09'x 62525 Alternatively, you can return the values of two variables by using xxx&yyy, where xxx and yyy are variable names. For example: ABSTRACT=title&entrydate could return Example: XTESDAA id124 '09'x The first month '09'x 12 Jan 2003 * To specify a range, use N1 - N2. Thus, to read records 10 to 25, use "10 - 25 " The order of the records returned is the same as the specified range FIND: Find a record; return one, or several, variables SERIES: series name ACTION: FIND Optional action modifiers: ID7, VAR, ALL and * IDENTIFIER: either a string, or a special identifier VAL1: A space delimited list of variables to return, where > %ID can be used to mean the "identifier" variable. > * means %ID and all 10 variables (some of which may not be set). > '' to return basic information. VAL2: A variable name (to be searched). Only used if VAR is used as an action modifier aresult: If success (assuming J variables were specified): 0 ID7 VarName1 ... VarNameJ Var1_value ... VarJ_value If success (assuming 0 variables were specified): 0 ID7 If ID7 was used as an ACTION modifier, this ID7 is the same as the IDENTIFIER (though upper case). If falure 1_digit_errorcode The error codes are: 1 = Data series is not open 3 = Variable not found 4 = Bad ID7 name 5 = Record not found 8 = Bad data series name For example, astat=sre_cache_data('SERIES1','FIND','RECORD_12151','CATS %VAR4') might return: 0 ABCDEF1 CATS %VAR4 Garfield Mothra whereas: astat=sre_cache_data('SERIES1','FIND','Z2BC2','HORSES') might return: 3 Special actions: VAR : Instead of looking at identifiers, examine the values of the variable named in VAL2 (if no such variable exists, a '3' error code is returned). Either the ID7 of the first match, or the ID7s of all the matches (see below), will be returned. ALL : Only used if VAR is a special action. Causes a space delimited list of ID7 of all the matches, not just an ID7 of the best or first match. to be returned. * : If there is no exact match (to the identifier or (selected variable), a "wild card" match will be attempted. This entails comparing the current identifier to all entries with an * in their identifier (or in the selected variable) -- and returning the "best match" (or all matches). SEARCH: Search, on a variable, in all records for a match. Returns one, or several, ID7s of matching records. This is MUCH slower then FIND, especially in large datasets, but it is more flexible. SERIES: series name ACTION: SEARCH option_list: ALL or * ALL : return ID7S of all matching records. Otherwise, return the first match (or best match, if * is specified) * : do case insensitive wild card searches. Otherwise, do case-sensitive exact matches. Wildcard search means that * in a record's variable value means "wildcard". Note that * in the identifier is NOT treated as a wildcard (though it will match a * in record's variable). IDENTIFIER: a string to search for VAL1: Variable name, or %ID. The values of this variable, for all records in the series, will be compared to the value specified in IDENTIFIER. If %ID, then search record identifiers. aresult: If matches found. 0 ID7_list ID7_LIST will only have 1 element if ALL is not specified in the option_list. If no matches found 0 If falure 1_digit_errorcode The error codes are: 1 = Data series is not open 3 = Variable not found 8 = Bad data series name Example: all_matches=sre_cache_data(aseries,'SEARCH ALL','cats','SPECIES_VAR') REMOVE: Remove a record SERIES: series name ACTION: REMOVE Optional action modifiers: ID7, WRITE IDENTIFIER: either a string, or a special identifier VAL1,..,VAL10 : Not used aresult: A 0 for success, or a 1 digit error code The error codes are: 1 = Data series was not defined 3 = No such record 4 = Bad ID7 5 = Record not found 8 = Bad data series name Note that removal of a record does not remove space from the series data file. To actually remove the bytes, use the action. Examples: aa=sre_cache_data('MYSERIES','REMOVE','/HOME/Index.1') CLOSE: close one, or several, data series Before closing, all cached information will be written to the data series file. After closing, you will NOT be able to find, add, etc. items in the data series. That is, you'll have to OPEN it again if you want to to anything with it. SERIES: a series name, a space delimited list of series names, or * * means "all currently open series" ACTION: CLOSE IDENTIFIER: not used VAL1,..,VAL10 : Not used aresult: A 0 for success, or a 1 digit error code The error codes are: 1 = Data series was not defined 8 = Bad series name SAVE: save cached information in all open data series SERIES: not used ACTION: SAVE IDENTIFIER: not used VAL1,..,VAL10 : Not used aresult: 0 for success, or an integer # of charout errors Normally, cached information will be saved every few minutes. You can use SAVE to force all information to be saved immediatly. Note that CLOSE implies "SAVE first, and then CLOSE a specified data series". In contrast, SAVE keeps all the series open. The error codes are: 2 = Problem saving a data series INFO: info about an open data series SERIES: series name, or series names, or * * means "all currently open series" '' can be used if non-series specific parameters are requested ACTION: INFO Identifier: Not used VAL1 : Space delimited list of items to return aresult: If successful: 0 series .. seriesn tab ITEM1 ... ITEMK seriesname1 tab item1_value tab item2_value tab .. seriesname2 ... If not successful a 1 digit error code The error codes are: 1 = Data series is not open 2 = Invalid item 8 = Bad data series name The valid series specific "items" to ask for (case insensitive) SERIES: the series name used when the series was defined CREATION: the date & time the series was defined MODIFIED: the date & time the series was last modified (information written to the file) VARS: List of defined variables ALLVARS: List of all variables (defined, and default names) RECORDS: Current number of records REMOVED: Current number of removed records not yet deleted DESCRIPTION: The short description FILE: The name of the data series file There are also a few non-series specific parameters that can be obtained with info. CACHEHITS: A pair of comma delimited values: Number of cache hits, number of non-hits (info gotten from data series file), since last CLEANUP. DROPPED: Total number of "dropped" entries (that a CLEANUP will freeup) INCACHE: Number of entries in cache. SERIES: Space delimited list of currently defined series VERSION: The version of the SRE2003 cache-data daemon. Applications can check this to make sure that required features are supported. CLEANUP: Cleanup internal data SERIES: not used ACTION: CLEANUP Optional action modifiers: CACHE IDENTIFIER: not used VAL1,..,VAL10 : Not used aresult: A 0 for success, or a 1 digit error code The error codes are: 1 = Data series was not defined 2 = Bad header information in data series file 3 = Too many records (when reopening) 4 = Not enough records (when reopening) 8 = Bad series name This will 1) close all open data series (first saving data to files). 2) clear all data from memory. 3) reopen these data series The net effect is to freeup memory that may be "orphaned", due to REMOVEing entries or CLOSEing data series. However, since the cache is cleared, this will slow down response time for a while. If the CACHE action modifier is used, then ONLY the cache will be shrunk, using a least-recently-used algorithim. This does not free up as much memory, but it does retain the most useful cache entries. Note that the cache-data daemon will periodically do a cleanup of both the cache, and the data (see the user configurable parameters in SRE_DCSH.RXX for details). Thus, in most cases it is not necessary to use this command. REWRITE: rewrite the data file (fix odd records, remove old junk) SERIES: series name ACTION: REWRITE Action modifier: FIX IDENTIFIER: if DEFINEd with a file name, you MUST include it here Otherwise, identifier MUST NOT be used. VAL1,..,VAL10 : Not used aresult: A 0 for success, or a 1 digit error code The error codes are: 1 = Data series was not opened 2 = Problem reopening file 3 = Problem rewriting data series 5 = Bad series name 6 = Data series does not exist 7 = Unable to remove old file 8 = Bad data series name Notes: * If the FIX action modifier is specified, then the file will be "fixed" -- improperly sized entries will be fixed, and the number of records will be fixed. This is more time consuming then opening without FIX. * If the selected series is currently open, it will first be SAVED. * 'x ' and '. ' orphan lines are removed * If a data line does not end properly (say, file ends too soon, or an expected contination line is not found), the data line (and it's known continuations) will be ';' (commented) out DELETE: delete a data series SERIES: series name ACTION: DELETE Modifier: INDEX (if INDEX is present, the attempt to delete a quick-open index) IDENTIFIER: if DEFINEd with a file name, you MUST include it here Otherwise, identifier MUST NOT be used. VAL1 : Name of quick-open index file (only used if the INDEX modifier is present) VAL2,..,VAL10 : Not used aresult: A 0 for success, or a 1 digit error code This simply deletes a file. It is included as a convenience -- one could achieve the same effect by deleting the file from a command prompt. Actually, if the selected series is open, it also closes it (removing data from internal cache). The error codes are: 1 = Problem finding data series file 2 = Problem deleting data series ERROR: return an error message (a string), given an error code SERIES: action ACTION: ERROR IDENTIFIER: error code VAL1: optional extra message VAL2,..,VAL10 : Not used aresult: The error message associated with the "error code" for this "action". The error code is typically the first digit returned by SRE_CACHE_DATA. Example: aa=sre_cache_date(series_name,'OPEN INDEX') if aa<>0 then do amess=sre_cache_data('OPEN','ERROR',aa) say "error: "amess end stuff=sre_cache_data('SERIES1','FIND','RECORD_12151','VAR_INCOME') if abbrev(stuff,'0')=0 then do amess=sre_cache_data('FIND','ERROR',left(stuff,1),' (on lookup of VAR_INCOME)') say " Error: "||amess end --------------------- 3) The 7 character "special identifiers"; and a note on unique identifiers: Each record is identified using a seven character identifier. The identifier is either explicitly specified by you (by using ID7 as an action modifier), or it is automatically generated. The 7 character identifier has two components: The first 6 characters identify the "family" The 7th character identifies the individual within a family. For automatically generated identifiers, the family is a hash of the IDENTIFIER argument. The "individual" is used to differentiate between multiple records with the same IDENTIFIER. The identifiers can be one of 38 different (case insensitive) characters: A-Z, 0-9, $, and _. Thus, you can only have 38 entries within the same "family". Therefore, when automatically creating an identifier (from a string), there is a chance that you will "run out of room" -- that the identifier will be assigned to a family for which there are already 38 members. In some uses (such as maintaining a running total of hits for uniquely named files or selectors) this should be an extremely rare occurence -- there is a slight chance that two different filenames will yield the same 6 character hash (hence the use of the 7th character). However.... For other uses this may quickly become a problem. For example, if you are recording the time and client address of all requests for a specific file, using the file as the identifier would NOT be advisable. HENCE: if you are using SRE_CACHE_DATA for storing data (rather then for maintaining information on a small set of pre-defined items), you SHOULD use unique IDENTIFERS (for example, the time a record was created). --------------------- 4) Hint on saving non-text strings, or strings with tabs and CRLFS. It takes more time and space, but you can url-encode (or otherwise convert) the strings before saving, and then unconvert after retrieving the value (at some later date). For example, you can use SRE_MAKE_PACK64 to "pack the string" into text characters, and SRE_PACK64 to "unpack" it. Since the characters in a PACK64 string are all "legal" SRE_CACHE_DATA characters, this allows you to save any kind of string (not just strings with tabs and crlfs). --------------------- 5) File structure: Note that SRE2003 data series files are NOT meant to be modified by the user -- all changes to the data file should be done using the data-cache daemon. However, these files can be read with a normal (80 character wide line) text viewer. * Each entry in a data series file consists of an 82 character line (last 2 characters are always CRLF). The structure of the first line of a dara record is: 7-char-id 1 space bytes (length of info), 6 chars wide (hence truncated at 999999) space up to 65 characters of info crlf Continuation lines are: . space up to 78 characters of info crlf Other lines are Meta data: # space meta data (78 char wide) crlf Comments ; space comment (up to 78 characters) crlf Deleted lines (to be removed) x space up to 78 characters of arbirary stuff crlf * Records consist of a least one entry, but can contain an unlimited number. * Values (stored in the info field of records) can be arbitrarily long -- they will be stored in as many entries as needed. That is, values can span multiple entries. * The top of the file contains meta information in a name: value format. For example (ignoring the leading "#" code): SERIES: MYSERIES CREATED: 20 June 2001 10:45:01 MODIFIED: 16 Feb 2002 15:31:22 RECORDS: 203 REMOVED: 0 VAR1: DOGS VAR2: CATS * Values may contains spaces, but they may NOT contain tabs (or other non "text" characters). By default, a tab is used as a "value seperator" character. This allows for text viewers to read the file, but may cause some wrapping problems. * Comments lines start with a ; Comment lines can be freely interspersed (even in the middle of a set of continuation lines). * Removed records start with a "x" -- including removed continuation lines. * A special "identifier", *******, is used to store locations of all entries that contain * characters in their identifiers. When a data series is created, an empty "*******" entry is generated and stored to the data series file.