ORACLE TEXT: implementing scoring text search engine with example

Oracle Text is a tool that enables you to build text query applications and document classification applications. Oracle Text provides indexing, word and theme searching, and viewing capabilities for text.
This post is related to text query functionality. Here I will explain how to use oracle scoring function with an example.

Using oracle text and score function we can implement  search functionality on several database columns, and results will be ordered by their relevance.We can implement searches on text columns using "LIKE" but it will not give you how relevant is each result.

Oracle text is already installed in oracle 11g and oracle 12c versions. If you are using oracle 10g Please refer this url to know how to install the oracle text extension http://download.oracle.com/docs/cd/B19306_01/install.102/e10319/initmedia.htm
Once installed, in order to run query on your text columns, you must add a index. Oracle Text provides three type of index for your text/documents, Below is the explanation copied from the oracle documentation.

CONTEXT: 
Use this index to build a text retrieval application when your text consists of large coherent documents. You can index documents of different formats such as Microsoft Word, HTML, XML, or plain text.
You can customize your index in a variety of ways.
This index uses CONTAINS clause
CTXCAT:
Use this index type to improve mixed query performance. Suitable for querying small text fragments with structured criteria like dates, item names, and prices that are stored across columns.
This index uses CATSEARCH clause
CTXRULE:
Use to build a document classification application. You create this index on a table of queries, where each query has a classification.
Single documents (plain text, HTML, or XML) can be classified by using the MATCHES operator.
This index use MATCHES clause

In order to use SCORING function, that returns results’ relevance, we have to use CONTEXT index.
Please follow the below example step by step to understand how to implement scoring function to a text column.
SQL> create table friends (id number, name varchar2(23), about varchar2(100));
SQL> CREATE INDEX scoreindx ON friends(about) INDEXTYPE IS CTXSYS.CONTEXT;
 Insert some rows...
SQL> select * from friends;
        ID NAME                    ABOUT
---------- ----------------------- -----------------------------------------
         1 fahad                   colq rmte frnd
         2 arun                    frnd frnd rmte
         3 javed                   colq colq colq
         4 rajeev                  colq colq frnd

Now let us run the following queries to check how score() function is working.
SQL> select name, score(1) as rating from friends where contains(about, 'frnd', 1)>0 order by rating desc;
 NAME                        RATING
----------------------- ----------
arun                             7
rajeev                           3
fahad                            3
 SQL> select name, score(1) as rating from friends where contains(about, 'colq', 1) >0 order by rating desc;
 NAME                        RATING
----------------------- ----------
javed                           10
rajeev                           7
fahad                            3
You can see that the output is ordered with respect to the relevance of the search term. The column rating in the output gives the relevance. 
If you want to search for more than one text the score() function can be used as below.
SQL> select name, score(1)+score(2) as rating from friends where contains(about, 'rmte', 1) >0 or contains(about, 'frnd', 2)>0 order by rating desc;
 NAME                        RATING
----------------------- ----------
arun                            11
fahad                            7
rajeev                           3
Normally we will not be aware of the number of words in the search string. So we may have to create the query dynamically in the application and run the same. 
Here below I am giving you an example to create the query with score() function dynamically with respect to the query term you provide. 
Here in this example I am searching for 'I am going to America' you can change this string to anything, and obviously you will be using a variable to give the search string in our application.

declare
   search_string varchar2(1000) :='i am going to america' ;
   words varchar2(25);
   select_string varchar2(1000) :='select name, ';
   cond_string varchar2(1000) := 'where ';
   wcount number ;
   statement varchar2(1000);
begin
   select length(search_string)-length(replace(search_string , ' ' ))+1 into wcount from dual;
   select_string:=select_string || 'score(1) as rating1';
   cond_string:= cond_string||' contains (about, '''|| search_string ||''', 1)>0';
   for i in 2..wcount+1 loop
       select regexp_substr(search_string, '[^ ]+', 1, i-1) into words  from dual;
       if i=2 then
            select_string:=select_string||', score('||i||')';
            cond_string:=cond_string||' or contains(about,''' || words||''','||i||')>0 ';
      else
            select_string:=select_string||'+score('||i||')';
            cond_string:=cond_string||' or contains(about,''' || words||''','||i||')>0 ';
      end if;
   end loop;
   statement:= select_string || ' as rating2 from friends '||chr(10) || cond_string ||' order by rating1 desc, rating2 desc';
   dbms_output.put_line(statement);
end;
/
Output will be as given below.

select name score(1) as rating1, score(2)+score(3)+score(4)+score(5)+score(6) as rating2 from friends
where  contains (about, 'i am going to america', 1)>0 contains(about,'am',2)>0 or contains(about,'going',3)>0 or contains(about,'to',4)>0 or contains(about,'america',5)>0 or
contains(about,'',6)>0  order by rating1 desc, rating2 desc

You can do modification in the script if the provided one does not satisfy your need. I hope this one will help you to create a new one. 
Ref:
http://docs.oracle.com/cd/B10501_01/text.920/a96517/cdefault.htm

No comments:

Post a Comment