1

I am creating a database that needs to search in text files (.doc, .txt, .pdf, ...). Start creating a preference:

ctx_ddl.create_preference('DOCSPIDER_DIR','FILE_DATASTORE');

I do not assign the 'path' value as there may be subdirectories. Then I create a table:

create table document (id number, path varchar2(2000));
ALTER TABLE document ADD (CONSTRAINT document_pk PRIMARY KEY (ID));

Create the index:

create index document_index on document(path)
indextype is ctxsys.context
parameters ('datastore DOCSPIDER_DIR filter ctxsys.auto_filter');

And the command to sync:

ctx_ddl.sync_index('document_index', '2M');

After the structure is created, I insert a record, pointing to an existing document:

INSERT INTO document VALUES (1, '\\server\oracle_text_files\file_name.txt');

However, when you run a query searching the contents of this document, it does not return data:

SELECT * from document WHERE CONTAINS(path, 'test', 1) > 0;

Something's missing?

Cesar
  • 33
  • 1
  • 6

1 Answers1

0

I bet you will get results if you go:

SELECT * from document WHERE CONTAINS(path, 'txt', 1) > 0;

Indeed, that INSERT you do isn't going to load the file into the database! You must look into something to load your file as CLOB or something.

There are interesting posts on SO to do the same, say

J. Chomel
  • 8,193
  • 15
  • 41
  • 69
  • Hello Chomel, thank you for your response. However, I can not import the contents of the documents into the database. I would need to use the oracle feature that stream the document, as reported in the documentation: https://docs.oracle.com/cd/B28359_01/text.111/b28303/quicktour.htm#i1008390 – Cesar Sep 04 '17 at 18:11
  • From the documentation I do not get this. In the example, the string you want to find in the text are in the database, but you give the full text to the match function. When you go to next chapter, you see the table has a club... – J. Chomel Sep 04 '17 at 19:00
  • In this documentation topic it is said that the Datastore object reads from one of the sources (database, url or files): https://docs.oracle.com/cd/B28359_01/text.111/b28303/ind.htm#CIHIHJGG So I do not think it would be necessary to load the contents of the files into the database. _(...)For example, if you have defined your datastore as FILE_DATASTORE, then the stream starts by reading the files from the operating system. You can also store your documents on the Internet or in Oracle Database._ – Cesar Sep 04 '17 at 20:25