0

I am using switchyard which is a wrapper over apache camel. My file consumer consumes from a directory where large number of files(some times 2,000,000) are written. Ideal consumption speed of my consumer is 1000+ files per second but when more than 50000 files are written then my consumer becomes slow and the consumption speed reduces 5 times.

I have disabled the sortBy option and even enabled shuffle option. But no luck. Here is my file binding detail.

    <file:binding.file name="XXXXXXXXXXXX">
    <file:additionalUriParameters>
      <file:parameter name="antInclude" value="*.xml"/>
      <file:parameter name="consumer.bridgeErrorHandler" value="true"/>
      <file:parameter name="shuffle" value="true"/>
    </file:additionalUriParameters>
    <file:directory>directory path</file:directory>
    <file:autoCreate>false</file:autoCreate>
    <file:consume>
      <file:delay>100</file:delay>
      <file:maxMessagesPerPoll>20</file:maxMessagesPerPoll>
      <file:delete>true</file:delete>
      <file:moveFailed>directory path</file:moveFailed>
      <file:readLock>markerFile</file:readLock>
    </file:consume>
    </file:binding.file>

How can I make my consumer to maintain same consumption speed of 1000 files/second even when there are large number of files in the inbound directory?

Darius X.
  • 2,886
  • 4
  • 24
  • 51
shakti
  • 33
  • 4

2 Answers2

1

Your configuration is telling Camel to:

  • poll 10 times per second (delay=100 ms)
  • read a max of 20 each time

So, I expect that you are getting about 200 files per second?

Set file: maxMessagesPerPoll=200.

Of course, the assumption is that all your downstream processing can handle that extra load.

As @Conffusion commented above, you are shuffling the list of files. So, that likely creates a list of all the files, then shuffles it and gives you the number you asked for. Do you really need that as part of your requirement?

Essentially...play with each of the file parameters and see what impact it makes.

Darius X.
  • 2,886
  • 4
  • 24
  • 51
  • Yes downstream can handle additional load if any. But my problem is, consumption speed decreases when number of files increases. I assume this is because camel file component sorts the file first and then consume it. Is there any option to inform camel to pick files randomly? – shakti May 09 '19 at 14:37
0

I would suggest that filesystem performance is the root cause here with that many files in a single folder.

You should be able to verify this hypothesis using standard tools from your OS - like ls on Linux or dir on Windows. Just compare the execution time of the command having all those files created.

As for the solution - I would suggest splitting those files into subdirectories like described in this answer to NTFS performance and large volumes of files and directories question.

Illya Kysil
  • 1,642
  • 10
  • 18