0

I have a class that receives a large (4Mb) XML file via an exposed web services, then it proceeds to save the file to the system and another method loads the file to memory and processes it and saves it and the result of the process to the database.

I understand that loading the complete file to memory should use a considerable amount of memory, but the problem is after it processes the file (the is closed and deleted), the app memory is not released.

Initially we had this task as a sidekiq job and the memory used by sidekiq was not released, fearing a memory leak in sidekiq we ran the process outside sidekiq and the memory of the whole app suffers the same 'leak'.

We use Nokogiri to load the file and go through the xpath.

To try and profile the leak we are running this script:

def ostats(last_stat = nil)
stats = Hash.new(0)
ObjectSpace.each_object {|o| stats[o.class] += 1}

  stats.sort {|(k1,v1),(k2,v2)| v2 <=> v1}.each do |k,v|
    printf "%-30s  %10d", k, v
    printf " | delta %10d", (v - last_stat[k]) if last_stat
    puts
  end

  stats
end

stats = nil
c = Class.new
stats = ostats(stats)
puts 'Press to continue'
option = gets
stats = ostats(stats)
puts 'Press to continue'
option = gets
GC.start
puts 'Executing GC...'
stats = ostats(stats)

The script gets the number of objects per class loaded in memory, before the method, after the method and again after executing the GarbageCollector and the output shows no big difference:

String                               10093
Array                                  880
RubyVM::InstructionSequence            674
Class                                  278
Encoding                               100
Regexp                                  67
Hash                                    59
Module                                  22
MatchData                               20
Gem::Requirement                        19
File                                    18
RubyVM::Env                             11
Gem::Version                             9
Proc                                     9
Binding                                  7
Gem::Specification                       6
Time                                     5
Float                                    4
Mutex                                    3
IO                                       3
Object                                   3
Bignum                                   2
Thread::Backtrace                        2
LoadError                                2
Rational                                 2
Complex                                  1
ThreadGroup                              1
IOError                                  1
Thread                                   1
RubyVM                                   1
NoMemoryError                            1
SystemStackError                         1
Random                                   1
ARGF.class                               1
Data                                     1
fatal                                    1
Range                                    1
#<Class:0x007efce0bbed68>                1
Gem::Platform                            1
Monitor                                  1

------------------
String                               10339 | delta        246
Array                                  923 | delta         43
RubyVM::InstructionSequence            674 | delta          0
Class                                  278 | delta          0
Encoding                               100 | delta          0
Regexp                                  67 | delta          0
Hash                                    60 | delta          1
Module                                  22 | delta          0
MatchData                               20 | delta          0
Gem::Requirement                        19 | delta          0
File                                    18 | delta          0
RubyVM::Env                             11 | delta          0
Gem::Version                             9 | delta          0
Proc                                     9 | delta          0
Binding                                  7 | delta          0
Gem::Specification                       6 | delta          0
Time                                     5 | delta          0
Mutex                                    4 | delta          1
Float                                    4 | delta          0
IO                                       3 | delta          0
Object                                   3 | delta          0
Bignum                                   2 | delta          0
Thread::Backtrace                        2 | delta          0
LoadError                                2 | delta          0
Rational                                 2 | delta          0
Complex                                  1 | delta          0
ThreadGroup                              1 | delta          0
IOError                                  1 | delta          0
Thread                                   1 | delta          0
RubyVM                                   1 | delta          0
NoMemoryError                            1 | delta          0
SystemStackError                         1 | delta          0
Random                                   1 | delta          0
ARGF.class                               1 | delta          0
Data                                     1 | delta          0
fatal                                    1 | delta          0
Range                                    1 | delta          0
#<Class:0x007efce0bbed68>                1 | delta          0
Gem::Platform                            1 | delta          0
Monitor                                  1 | delta          0

Executing GC...

String                                5018 | delta      -5321
RubyVM::InstructionSequence            596 | delta        -78
Class                                  278 | delta          0
Array                                  251 | delta       -672
Encoding                               100 | delta          0
Regexp                                  67 | delta          0
Hash                                    29 | delta        -31
Module                                  22 | delta          0
Gem::Requirement                        14 | delta         -5
Gem::Version                             9 | delta          0
Proc                                     9 | delta          0
Gem::Specification                       6 | delta          0
RubyVM::Env                              5 | delta         -6
Mutex                                    4 | delta          0
Float                                    4 | delta          0
Time                                     4 | delta         -1
IO                                       3 | delta          0
Object                                   3 | delta          0
Bignum                                   2 | delta          0
Complex                                  1 | delta          0
ThreadGroup                              1 | delta          0
IOError                                  1 | delta          0
Binding                                  1 | delta         -6
Thread                                   1 | delta          0
RubyVM                                   1 | delta          0
NoMemoryError                            1 | delta          0
SystemStackError                         1 | delta          0
Random                                   1 | delta          0
ARGF.class                               1 | delta          0
Data                                     1 | delta          0
fatal                                    1 | delta          0
Range                                    1 | delta          0
#<Class:0x007efce0bbed68>                1 | delta          0
Gem::Platform                            1 | delta          0
Monitor                                  1 | delta          0
Adrian Matteo
  • 987
  • 1
  • 10
  • 28
  • Usually, the memory footprint of a program is not reduced by freeing data structures. So, if you allocate a chunk of memory, and then free it, it is not returned to the OS. See also http://stackoverflow.com/a/12051394/100754 – Sinan Ünür Dec 01 '15 at 15:01
  • Consider using Nokogiri's SAX parser so that you are not loading the entire file into memory in the first place. – infused Dec 01 '15 at 20:42
  • There isn't a question being asked. Do you want help with something? If so, we need the minimal code necessary to demonstrate the problem, also input data that reproduces it. Are you asking how you should debug/isolate the problem? 4MB is not a large file, especially these days. The DOM parser should have no problem processing it, and normal Nokogiri coding should result in the space eventually being reclaimed, but, without code demonstrating the problem we can't really say. – the Tin Man Dec 08 '15 at 00:51

0 Answers0