1

I'm new to Elasticsearch. I have successfully installed Elasticsearch with Kibana, X-pack and ingest-attachment. I have both Elasticsearch and Kibana running. I have kept it simple at the moment with the install using default options on a windows 2012 server. I have a directory on another drive w\mydocs and at the moment it just has 3 plain text files in it, but I will want to add others like pdf and doc file types. So now I want to get these files into Elasticsearches index. I have tried using the following link as a guide Attaching pdf docs in Elasticsearch, however I cannot get it to work.

Here's how I have set up the index and pipeline:

PUT _ingest/pipeline/docs 
{
  "description": "documents",
  "processors" : [
    {
      "attachment" : {
        "field": "data",
        "indexed_chars" : -1
      }
    }]
}
PUT myindex
{
  "mappings" : {
    "documents" : {
      "properties" : {
        "attachment.data" : {
          "type": "text",
          "analyzer": "standard"
        }
      }
    }
  }
}

Then to get the first document in I use the following: PUT localhost:9200/documents/1?pipeline=docs -d @/w/mydocs/README.TXT

and the error that I receive is:

{
  "error": {
    "root_cause": [
      {
        "type": "parse_exception",
        "reason": "request body is required"
      }
    ],
    "type": "parse_exception",
    "reason": "request body is required"
  },
  "status": 400
}
bilpor
  • 3,467
  • 6
  • 32
  • 77

2 Answers2

1

you still have to send valid JSON to Elasticsearch, even when indexing binary data. This means, that you have to encode your document as base64 and then put it into a JSON document like this

{
  "data" : "base64encodedcontentofyourfile"
}
alr
  • 1,744
  • 1
  • 10
  • 11
  • so now if I put the following: `PUT localhost:9200/documents/1?pipeline=docs { "data": "base64_encode('w:\\myDocs\\README.TXT')" }` I receive an Illegal base64 character 5f argument exception – bilpor Aug 03 '17 at 10:03
  • You need to convert the content of the file into base64 on the client side and then sent that string as the data field. Just specifying the path does not work. – alr Aug 03 '17 at 10:45
1

I was advised not to use the ingest-attachment, but instead to use FsCrawler. I managed to get Fscrawler working without having to convert anything to base64.

bilpor
  • 3,467
  • 6
  • 32
  • 77