I'm working on a scraper which goes through websites and parses specific parts of them in Sidekiq workers. Imagine a situation when the scraper visits a website which contains 10 elements that I'm interested in and each of them is queued in Sidekiq. At the moment I pass the source code of the element as an argument which is loaded in Nokogiri later on. My question is - is it a good idea to pass a huge string as an argument to the Sidekiq worker? The string length is always between 77,000-80,000 characters so it's really huge. Or should I store it in a temporary table and find the specific record before loading by Nokogiri?

- 6,656
- 4
- 18
- 22

- 383
- 2
- 13
-
Why don't you try testing the performance of each approach? – Tom Lord Apr 14 '18 at 12:25
-
[Best Practices](https://github.com/mperham/sidekiq/wiki/Best-Practices) says _"Make your job parameters small and simple"_. – Stefan Apr 15 '18 at 09:58
-
Why do you need to store the large string? Can't the worker pull in that data when it's running? i.e. make the api call, parse what's needed and complete it's purpose? – lacostenycoder Apr 15 '18 at 13:15
-
how did you approach this issues? I want to log api's response on sidekiq. but sometimes response of api is huge string. how should I do? – Jin Lim Apr 13 '22 at 00:31
-
I eventually stored the response in the db and then just passed the record ID to a Sidekiq worker. – user3014317 May 08 '22 at 22:14
2 Answers
I would recommend storing the string on S3(or any other object store) and use the returned URL to fetch the string and process the job.
This way you can ensure that a small Redis server can support many concurrent sidekiq jobs and will not go out of RAM.

- 111
- 2
- 8
As others have commented, it's best to keep your worker params as small as possible. You should pass the minimum possible data your worker needs to accomplish it's task. If you're using Sidekiq you may need to consider memory size. See sidekiq memory usage reset
Storing large string objects may become a memory problem depending on concurrency. You can get some idea of memory of your string memory size in ruby:
require 'securerandom'
require 'objspace'
str = SecureRandom.hex(40000) # generate a random 80k length string
ObjectSpace.memsize_of(str) #=> 80041 # < 1 MB for your example
UPDATE:
If you want to check memory size of non-string data like a hash, you could use something like:
hash = {key: str};
ObjectSpace.memsize_of(hash.to_s)
=> 131112

- 10,623
- 4
- 31
- 48
-
1I started to implement a middleware that would report memory violation in the Sidekiq worker arguments. But then I ran into an issue of what happens when the arguments to perform() are Hash, Array and other complex data types. Then it always gives me a fixed size. – Jack Chi Dec 03 '20 at 06:58
-