For everything related to FSCrawler project.
Questions tagged [fscrawler]
40 questions
2
votes
0 answers
How to ingest .doc / .docx files in elasticsearch?
I'm trying to index word documents in my elasticsearch environment. I tried using the elasticsearch ingest-attachment plugin, but it seems like it's only possible to ingest base64 encoded data.
My goal is to index whole directories with word files.…

xTheProgrammer
- 74
- 10
2
votes
1 answer
Index pdf files to AWS Elasticsearch service using Elasticsearch File System Crawler
I can index pdf files to a local Elasticsearch using Elasticsearch File System Crawler. The default, fscrawler setting has port, host and scheme parameters as shown below.
{
"name" : "job_name2",
"fs" : {
"url" : "/tmp/es",
"update_rate" :…

Fisseha Berhane
- 2,533
- 4
- 30
- 48
1
vote
1 answer
Dockerized elasticsearch and fscrawler: failed to create elasticsearch client, disabling crawler… Connection refused
I received the following error when attempting to connect Dockerized fscrawler to Dockerized elasticsearch:
[f.p.e.c.f.c.ElasticsearchClientManager] failed to create
elasticsearch client, disabling crawler… [f.p.e.c.f.FsCrawler] Fatal
error…

user2514157
- 545
- 6
- 24
1
vote
1 answer
Is there a way to check which pdf strategy FSCrawler will use?
I am using FSCrawler's REST feature to scan PDFs as they are uploaded. I'm currently using the ocr_and_text pdf strategy, however ocr takes too long for the user to wait for a response. I would like to send the pdf to fscrawler synchronously to use…

koopmac
- 936
- 10
- 27
1
vote
0 answers
FScrawler: perform OCR selectively only on PDF files that do not have text
I'm using FScrawler (2.7) to load text from PDFs into Elasticsearch (7.6.X).
Most of PDF files have text, but some of PDF files contain images of scanned text and need to be OCRed.
Is there a way to configure FScrawler such as that it performs OCR…

Paul
- 11
- 3
1
vote
1 answer
Indexing 7TB of data with elasticsearch. FScrawler stops after sometime
I am using fscrawler to create an index of data above 7TB. The indexing starts fine but then stops when the index size gets to 2.6gb. I believe this is a memory issue, how do I configure the memory?
My machine memory is 40GB and I have assigned 12GB…

Denn
- 447
- 1
- 6
- 27
1
vote
1 answer
fscrawler 2.3 with elasticsearch 5.5 getting error string index out of range
I have ElasticSearch 5.5 with x-pack working without any issue.
But while I trying use fscrawler 2.3 on a folder I get this error
WARN [f.p.e.c.f.FsCrawlerImpl]
Error while crawling c:/tmp/es: String index out of range: -1
What am I doing wrong?

Batrevenge
- 11
- 2
0
votes
1 answer
The Elasticsearch client version [7] is not compatible with the Elasticsearch cluster version [8.8.2]
I have upgraded Elasticsearch from 7.17.11 to 8.8.2.
# curl localhost:9200
{
"name" : "test.example.com",
"cluster_name" : "es_master01",
"cluster_uuid" : "U4n0aCHtTdinDZSH5jEcdg",
"version" : {
"number" : "8.8.2",
"build_flavor" :…

Manoj Agarwal
- 365
- 2
- 17
0
votes
0 answers
Fscrawler logs in Kubernetes and logstash
I have a Kubernetes Fscrawler deployment with several instances. The logs are mapped to a Persistent Volume.
I have also Elasticstack 8 with Logstash.
What I would like to do is sending the logs from the different Fscrawler to logstash to have a…

Ralle Mc Black
- 1,065
- 1
- 8
- 16
0
votes
1 answer
Push custom fields to metadata of PDF using fscrawler
I am using fscrawler to index PDF documents using the following command:
/usr/bin/fscrawler --config_dir /home/user1/conf test_index --restart --loop 1
The metadata of PDF is indexed. I want to add custom fields towards the metadata of PDF and…

Manoj Agarwal
- 365
- 2
- 17
0
votes
1 answer
Using fallback font 'LiberationSans' for 'CourierNew,Italic' warning with fscrawler v2.9
I am running fscrawler on two different CentOS 7.8 machines. On one machine, I get the following warning when running fscrawler:
13:03:28,449 WARN [o.a.p.p.f.PDTrueTypeFont] Using fallback font 'LiberationSans' for 'CourierNew,Italic'
Whereas on…

Manoj Agarwal
- 365
- 2
- 17
0
votes
1 answer
No SLF4J providers were found warning with fscrawler 2.10
I have upgraded fscrawler from 2.9 to 2.10. I tried the same command towards indexing that I used in the older version:
/usr/bin/fscrawler --config_dir /home/user1/conf test_index --restart --loop 1
I see the following warning about SLF4J:
SLF4J:…

Manoj Agarwal
- 365
- 2
- 17
0
votes
0 answers
Fscrawler configuration
Hi I am launching Fscrawler with elastic search in kibana inside docker containers and I am getting following error
fscrawler | Exception in thread "main" java.util.NoSuchElementException
fscrawler | at…

Lenchesterx
- 3
- 2
0
votes
1 answer
fscrawler get extracted text in restapi response
I implemented fscrawler with elasticsearch.
Rest is enabled.
I can post a file to fscrawler and the text is correctly extracted and put in the elasticsearch index.
I can verify that with Kibana.
I m not able to get the extracted text in the…

Ralle Mc Black
- 1,065
- 1
- 8
- 16
0
votes
1 answer
FSCrawler docker-compose NoSuchElementException
I try to run FSCrawler via docker-compose following the steps described in https://fscrawler.readthedocs.io/en/fscrawler-2.9/installation.html#using-docker-compose.
ELASTIC_VERSION = "7.17.8"
FSCRAWLER_VERSION = "2.9"
PWD = ""
I verified that…

Fried
- 41
- 2