Papers and Presentations
Through the course of our work, we have published a number of papers and presented at academic conferences. As much as possible, these are made available here.
Contextual Network Search (CNS) is a graph-based alternative to Latent Semantic Indexing which we began work on in 2003. In a 1981 dissertation at the University of Illinois, Scott Preece describes an almost identical technique under the name spreading activation search, but we have not been able to find further work in this direction. If you are aware of such work, please let us know. You can find out more about our own implementation of CNS in our preliminary paper [PDF] on the topic.
Latent Semantic Indexing (LSI or LSA, for latent semantic analysis) was originally described in a
1990 paper by Deerwester, Dumais, Furnas, Landauer, and Harshman, and is a topic of active study. You can find links to journal articles and other LSI websites on our references page. We have also published a layman's introduction to LSI, which goes into some detail about the process and where it can be useful. You can find an LSI Demo Machine at the Telcordia site. Telcordia has a patent claim on latent semantic indexing for text collections.
Tools for Thinking, Thinking Tools: we presented a paper at the MLA Annual Convention in December, 2005. Here we described our continued work with Graph Theory and literary analysis. The presentation slides and paper [PDF] are available for download.
Text Modeling and Visualization with Network Graphs: we presented a paper at
the Joint International Conference of the
Association for Literary and Linguistic Computing and the Association for Computers and
the Humanities in June, 2005, about our work with Graph Theory and Literature.
Statistical Natural Language Processing and Literary Analysis. We gave a talk
about our literary analysis tool at the Getty Research Institute in January, 2005.
Getting Smart with Search Technologies: we participated on a
panel discussing emerging topics in the field of information retrieval
at the O'Reilly Open Source
Convention.
Ad Hoc Authorship Attribution Contest: we participated in an authorship attribution contest using the tools developed in this project. The work was presented at the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, June, 2004. Our technique is described in this abstract [PDF].
An Automated Management Tool for Unstructured Data: we
presented this paper [PDF] at the IEEE/WIC International
Conference on Web Intelligence in October, 2003.
Part of Speech Tagging with Perl: we presented our work on Part of Speech tagging and other indexing algorithms at the O'Reilly Open Source Convention in July, 2003.
Building a Smarter Search Engine: we
presented on Latent Semantic Indexing and Graph-Theoretic Search algorithms at the O'Reilly Open Source Convention in July, 2003. Our slides from the talk are available here.
Peer-to-Peer Semantic Search Engines: Building a Memex: we gave a talk on peer-to-peer semantic search engines at the O'Reilly Emerging Technology
conference in April 2003. Our slides for the talk are available here.
Using Latent Semantic Analysis in Bioinformatics: we
presented our work on searching protein data with LSI at the
O'Reilly Biotechnology Conference in early February, 2003.