I'm using a Sidekiq worker to scrape a few websites and part of the process for parsing the data involves using Nokogiri to go through tables, etc.
However, I've been having a ton of issues with memory from sidekiq just constantly expanding more and more, and never freeing up the memory. The tasks only run once every few minutes, but the memory continues to grow. I'd expect it to free up the memory once the job is finished.
Is it not recommended to use Sidekiq for tasks such as this? Wondering what alternatives I may need to seek (if at all).
Here's an extremely simplified version of my worker:
class SampleWorker
include Sidekiq::Worker
def perform
response = get_request("https://website.com")
@parsed_response = Nokogiri::HTML(response.body).xpath("//tbody/tr")
end
def get_request(url,headers="")
uri = URI.parse(url)
http = Net::HTTP::Persistent.new
response = http.request uri
http.shutdown
return response
end
end
Obviously a lot more code is going on, but just for testing purposes, I simplified my code into this above. After running this one job 10 times, ruby jumps from using 3.7% memory to 12.2%.
Not quite sure why it's not freeing up the memory in between jobs. Perhaps I should run schedule a Linux cron job to run this script separately and not let Sidekiq manage it? My guess is if it ran outside of Sidekiq, it'd probably close and be done, not using any memory while it's not running/opened.
EDIT
So I just ran across rails runner
command and this seem to do exactly what I feel like Sidekiq should do -- it runs the job, completes, and frees up the memory. It just runs the worker and closes. Not sure why sidekiq can't do this without keeping the memory tied up. Perhaps I'm just not understanding something.