0

Can LOGSTASH read PDF file from a location and pull out content inside it and then send this content to destination (KAFKA)?

As I know LOGSTASH can read .TXT or .LOG or .CSV file but I am not sure if it is capable to read content from PDF.

Any suggestion on this line will be helpful.

If not, does kafka has this capability? Is it possible to read PDF content from APACHE KAFKA?

Dovydas Šopa
  • 2,282
  • 8
  • 26
  • 34
sparkingmyself
  • 148
  • 2
  • 15

1 Answers1

1

Logstash does not have a PDF input filter. You best bet is to find a program that can give you the text inside of a PDF file. There is this quesiton that might help: How to extract text from a PDF?

You could then setup something that generates text versions of the PDFs and then index those into elasticsearch using logstash.

Community
  • 1
  • 1
Alcanzar
  • 16,985
  • 6
  • 42
  • 59