1

Happy new year to all of you! I trained a deep learning model with theano(newest develop version). After one or two epochs it consumes the memory of 24% and I ran four of them at the same time, so they consume the whole memory. And when I run three of them, they will consume 33% memory. In the trainning it is just a loop statement and I do not accumulate any thing in a list or dict or something. I paste the main loop as follow for it may helps.

for i in range(epoch):
    print 'epoch:',i,'/',epoch,'begin'
    sys.stdout.flush()
    t0 = time.clock()
    for j, (name,doc) in enumerate(docs.iteritems()):
        print 'begin doc:',j,'/', n, 'contents:', len(doc.contents), 'time now:', time.strftime("%Y-%m-%d %A %X %Z", time.localtime()) 
        t1 = time.clock()
        train_samples = []
        get_large_margin_all_together_train_datas(doc, mod, parse_result,golds[name], 0, len(doc.contents), train_samples)
        for sample in train_samples:
            cost = mod.train_relation_with_struct(sample)
        print 'doc:',j,'/', n, 'contents:', len(doc.contents), 'cost:', cost, 'time:',time.clock() - t1
        #print 'gc:', gc.collect()
        sys.stdout.flush()
    print 'epoch:',i,'/',epoch,'consume:',time.clock()-t0
    mod.save_params()      
    sys.stdout.flush()
    test_results = pa.parse_docs(mod, test_docs, saved = False, n_threshold = 120, verbose = False)
    results = pa.f_value(test_golds, test_results, coarse = True, strict = False)        
    results = pa.f_value(test_golds, test_results, coarse = True, strict = True)        
    if results[3] > best_result:
        mod.save_params(best = True)

I tried to use gc.collect it still didn't help. And I ran it on our old server which setuped a old version of theano (0.6.0), there is no that kind of problem.

Is anyone got a clue? I would be quite appreciate it.

I got a alias question here.(https://groups.google.com/forum/#!topic/theano-users/vX6sPnGhRCg)

Amir
  • 10,600
  • 9
  • 48
  • 75
Jason
  • 169
  • 1
  • 11
  • It's too hard to say from this code - we can't see what `get_large_margin_all_together_train_datas` is doing for example. – Martin Konecny Jan 01 '16 at 00:13
  • Yeah, I know. But I think since the result only comes from train_samples i.e. it appends items to the list train_samples and for every doc train_sample refer to a new empty list, the old samples should be deleted for zero refs. If not the gc.collect should release those memory. So I don't know why memory leak may happen.. – Jason Jan 01 '16 at 00:56

0 Answers0