I am trying to index documents (read Pdf for ex) into elastic search.
My objective is to search documents based on matching content string.
To extract the document content, I am using Apache Tika .
I am not sure how should i index the document content along with document meta-data.
Below are the options i can think of:
Should i just add one field "content" having data type as String and simply store the document content as string there? (But not sure it will work for big size documents)
or I should make that field binary and encode the document content there. (But it will not be searchable)
Please advise.