Overview |

Automatic Site Indexing & Information Retrieval


Lambda-IR Overview Alpha

Lambda-IR is a set of tools to facilitate the creation of informational retrieval applications for use with the web server. The tools are designed to support information retrieval through integration of arbitrarily complicated lexical features with hybrid classification and learning algorithms. This release contains two example applications:

When loaded, LambdaVistaTM modifies http:export-url so that all URLs containing static HTML or text are indexed at export time. A standard form allows users to search the Web site and retrieve URLs using a boolean syntax. Please look at the source code to understand how to use the application. http:lambda-ir; and the LambdaVistaTM example in http:lambda-ir;examples;

The Email HyperArchive supports full-text search provided that you load the file http:examples;mail-archive-index.lisp. After this, all archives exported with keywords setting the archive type to "indexing-mail-archive" will automatically provide full-text search facilities.

Building indices is quite rapid; the process requires about 8 minutes on a MacIvory (XL1200 equivalent) to index all the CL-HTTP documents (including the standards) and about 7 minutes to index all of the mailing list archive for "WWW-CL" (with approximately 300 messages). And once indexed, full-text search process is blazingly fast; the major limitation is network latency.

At present, the Lambda-IR tools have been tested on the Lisp Machine and under MCL. The code is portable and should run on other platforms. However, some platform-specific efficiency tuning, particularly in the bit-vector code (in the file http:lambda-ir;bit-vectors.lisp) may prove helpful.


Andrew J. Blumberg (blumberg@nospam.ai.mit.edu)