1

I am able to parse json file in elasticsaerch. is there anyway to parse/index Microsoft outlooks PST files to Elasticsearch indexes??

thank you very much

Fardin Behboudi
  • 459
  • 4
  • 15

1 Answers1

2

You can use the ElasticSearch plugin "Ingest Attachment", which uses Tika to process natives (PDF, XLS, PST, etc...):

https://www.elastic.co/guide/en/elasticsearch/plugins/master/ingest-attachment.html

The "Ingest Attachment" plugin is formerly named "Mapper-Attachments" plugin, so you may find help with keywords from the old name:

https://www.elastic.co/guide/en/elasticsearch/plugins/current/mapper-attachments.html

Those plugins allow you to pass the base64 encoded PST directly to ElasticSearch, and ES will parse and index the data behind the scene for you automatically.

If you want something custom, I suggest using one of the many github projects that read PST files and then send the data to ElasticSearch in whatever document mapping you want. There are many github PST reader projects, so pick a popular one for whatever language you're most comfortable with (java, C#, etc...). Github suggested search terms: libpst, pst reader

You could also write a custom parser for Apache Tika, and use that instead of a PST reader library. Documentation on how to use that can be found here:

https://tika.apache.org/1.6/parser.html

Java example to base64 encode a file to string:

FileInputStream fileInputStreamReader = new FileInputStream(file);
byte[] bytes = new byte[(int)file.length()];
fileInputStreamReader.read(bytes);
String encodedfile = Base64.encodeBase64(bytes).toString();

Pass the resulting encodedfile string to a PUT call like this article shows:

https://www.elastic.co/guide/en/elasticsearch/plugins/current/using-ingest-attachment.html

Erik
  • 122
  • 9
  • thank you dear @erik i have some questions: how can i identify if my file is base64? how to use this plugine?now i have installed that but i dont know how to use it? i am very new to elastic. i could load a json file and query that in elastic, but i dont know how to this action for pst file? – Fardin Behboudi Feb 13 '17 at 15:51
  • Base64 encoding files in java can be found here: http://stackoverflow.com/questions/13109588/base64-encoding-in-java Here are several examples to load data with PUT calls to the Ingest plugin: https://www.elastic.co/guide/en/elasticsearch/plugins/master/using-ingest-attachment.html – Erik Feb 14 '17 at 01:21
  • Erik i think there is a mistake . i do not want to encode my file i just want to identify if it is base64 or not. also the second link i already read that, but it is not covering the loading info from file to index, – Fardin Behboudi Feb 14 '17 at 09:47
  • Your original PST file is almost certainly not already base64 encoded. I've edited my answer above to include a java base64 encoding sample. Use that to get your PST file encoded as a base64 string, which you'll pass to the PUT call in the "using ingest attachment" article's examples. – Erik Feb 14 '17 at 15:20