Semantic::API - Perl extension for a graph-based search
The Semantic Engine has emerged from an attempt to improve on standard keyword searches. By analyzing the statistical patterns in natural language, concept-based relationships can be established between distinct texts that may not share particular key words. By representing a text collection as a graph-theoretic network, similarities and relationships can easily be found in otherwise unstructured data.
use Semantic::API;
my $semantic = Semantic::API::Index->new(
collection => 'my_collection',
storage => 'mysql',
database => 'my_database',
host => 'localhost',
username => 'my_user',
password => '***',
min_term_frequency => 2,
max_document_frequency => '0.2' );
-- OR --
my $semantic = Semantic::API::Index->new(
collection => 'my_collection',
storage => 'sqlite',
database => 'my_database' );
$semantic->add_word_filters(
too_many_numbers => 10,
minimum_length => 3,
maximum_word_length => 15,
maximum_phrase_length=> 3,
blacklist => \@blacklist,
whitelist => \@whitelist);
$semantic->index_file( $filename ); # read this file and index it! ... $semantic->set_default_encoding( "utf8"); # use this encoding for any incoming text $semantic->index( $id, $text ); ... $semantic->index( $id, $text, $weight ); # give this item a different weight
$semantic->finish(); # commit everything to the database
use Semantic::API;
my $semantic = Semantic::API::Search->new( collection => 'my_collection',
storage => 'mysql',
database => 'my_database',
username => 'my_user',
password => '***',
host => 'localhost' );
-- OR --
my $semantic = Semantic::API::Search->new( collection => 'my_collection',
storage => 'sqlite',
database => 'my_database' );
my ($results, $terms ) = $semantic->semantic_search( 'query' ); -- OR -- my ($results, $terms ) = $semantic->keyword_search( 'query' );
my @term_list = sort { $terms->{$b} <=> $terms->{$a} } keys %$terms;
my @result_list = sort { $results->{$b} <=> $results->{$a} } keys %$results;
foreach( @result_list ){
...
}
storage => 'mysql' or 'sqlite'
collection => 'collection name'
database => 'database name'
username => 'mysql username' (optional)
password => 'mysql password' (optional)
host => 'mysql host' (optional)
min_term_frequency => 'minimum occurrence of a term' (optional)
max_document_frequency => 'maximum percent of collection in
which a term occurs' (optional)
Additional parameters are listed below.
Additional optional parameters:
lexicon => 'path/to/lexicon.gz'
default_encoding => 'iso-8859-1'
parsing_method => 'nouns'
minimum_length => $num # omit words with fewer characters than $num
maximum_word_length => $num # omit words with more characters than $num
maximum_phrase_length => $num # omit phrases with more words than $num
too_many_numbers => $num # omit words containing more numbers than $num
blacklist => \@array # omit words in this array
whitelist => \@array # keep only words in this array
finish()
Additional optional parameters (with default values):
depth => 4 # depth of graph traversal
trials => 100 # number of trials for random walk
keep_top_edges => 0.3 # percent of edges kept before traversal
# set this to `1' to do no pruning
node(s) rather than a term node.
These are exported by Semantic::API by request only
For more information, please visit http://www.knowledgesearch.org
Aaron Coburn, <acoburn@middlebury.edu>
Gabe Schine, <gschine@middlebury.edu>
Copyright (C) 2006 by Aaron Coburn and Gabe Schine
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.6 or, at your option, any later version of Perl 5 you may have available.