I have written a rather complex torch application and it works quite well, that is if it doesn't run out of memory. Now I have tried to see what sort of inputs or situations cause it too seemingly randomly run out of memory but so far I have had little to no success. So now I'm looking for a way to check which variables take how much (v)ram.
I can with a simple statement switch between running my code on caffe:cuda or caffe:cl which changes whatever or not my program runs in RAM or on the GPU, I imagine that such a switch will make validating my memory usage a lot easier.
I have already tried to use print(collectgarbage("count")*1024)
to check how much memory is in usage at a given point in time however this does not clearly show me where the memory is being used, perhaps because the program is relatively complex (although there are a few variables which I suspect are hugging a lot of memory, neural networks, large matrices and such).
I already know that once I have identified who is hogging my memory I can assign a nill value to it and call the garbage collector too free it.
So in short is there a program or a tool that allows me to run a torch program and then list each variable and it's memory usage?