4

I have to store a tiff(tag image file format) or pdf scanned file in mongodb that should be Text search able . like if we want to search "on base of text" it should be able to search .

I am going to use .net mvc or java with mongodb .

so how can i store this pdf file and then can retrieve from database .

any suggestion will be appreciated .

thanks

Waqas Rana
  • 353
  • 1
  • 8
  • 14

2 Answers2

6

You can store files by using MongoDb GridFs as described in this question and extract texts from a PDF file by using some features those described in this question. ;).

HTH

Community
  • 1
  • 1
shA.t
  • 16,580
  • 5
  • 54
  • 111
2

I think that you should save the files on file system of the server and the path of the file and the string from the file inside of MongoDB, It's more efficient to read the file from the servers filesystem then to load them from MongoDB.

The other option is to save the file as binary data but then you won't be able to search inside the file.

Pini Cheyni
  • 5,073
  • 2
  • 40
  • 58
  • 2
    all right . but if i follow the first way that you have mentioned above , would i be able to search in file ? main purpose is to search in file . – Waqas Rana Dec 11 '16 at 07:56
  • 1
    In case this is pdf with text you sould extract all the text and save it seperatly , tiff and images you will have to do OCR and process them seperatly to extract all the text on which you will make your search queries. – Pini Cheyni Dec 12 '16 at 08:18