I'm trying to index documents(.doc,.ppt,.pdf etc) as an attachment(storing the content field as BASE64 content) and then do a search query and highlight the content field on the resultant files. When I'm indexing them, why is the size of files increased?
For eg: The total size of the folder from which documents are indexed is 30mb. But the head plugin is showing 127mb for the same number of files(which are indexed from the same folder)
Here is my mapping style:
var response= client.CreateIndex(defaultIndex, c => c
.Mappings(m => m
.Map<Document>(mp => mp
.Properties(ps => ps
.String(s => s.Name(e => e.Title))
.Attachment(s => s.Name(p => p.File)
.FileField(ff => ff.Name(f => f.File)
.TermVector(TermVectorOption.WithPositionsOffsetsPayloads)
.Analyzer("english")
.Store(true)))))));
Observation:(Dont know if I'm correct with this) When I indexed the documents using manual id, the size is around 36mb but when I remove the Id field and index, then it is taking so much time to index, the size is more and the search function is not working properly. Does it depend on how the file is indexed?)
TIA