0

I created a Spring Batch Integration project for process multiples files and it is working like a charm.

While I'm writing this question I have four Pods running, but the behaviour isn't like I'm expecting, I expect 20 files being processing at the same time (five per Pod).

My pooler setup is using the following parameters:

    poller-delay: 10000
    max-message-per-poll: 5

I also using Redis for store the files and filter:

    private CompositeFileListFilter<S3ObjectSummary> s3FileListFilter() {
        return new CompositeFileListFilter<S3ObjectSummary>().addFilter(
                new S3PersistentAcceptOnceFileListFilter(new RedisMetadataStore(redisConnectionFactory), "prefix-"))
                .addFilter(new S3RegexPatternFileListFilter(".*\\.csv$"));
    }

Seems like each pod is processing only one file and also another strange behaviour is like one of the pods register all the files in the Redis, so the others Pods only get new files.

How is the best practice and also how to solve that for processing multiples files at the same time?

Guilherme Bernardi
  • 490
  • 1
  • 6
  • 18

1 Answers1

1

See this option on the S3InboundFileSynchronizingMessageSource:

/**
 * Set the maximum number of objects the source should fetch if it is necessary to
 * fetch objects. Setting the
 * maxFetchSize to 0 disables remote fetching, a negative value indicates no limit.
 * @param maxFetchSize the max fetch size; a negative value means unlimited.
 */
@ManagedAttribute(description = "Maximum objects to fetch")
void setMaxFetchSize(int maxFetchSize);

And here is the doc: https://docs.spring.io/spring-integration/docs/current/reference/html/ftp.html#ftp-max-fetch

Artem Bilan
  • 113,505
  • 11
  • 91
  • 118
  • Thank you for your attention, I read that's what I need. I'm thinking here just to understand in my case my pooler is running every 10 seconds and the max-message is 5. So if I set for example maxFetchSize like 2, pooler will run but only get 2 new files if the pool of messages less than 5 messages? – Guilherme Bernardi Jul 28 '21 at 14:01
  • No, that's not. It will definitely poll 5 messages. Fetch is about how many to take from the source system at once and cache them in the memory for subsequent poll. But since you want it to be in parallel on many instances, then it is better to not fetch all the existing entries. – Artem Bilan Jul 28 '21 at 14:48