3

I'm working on a monte carol pricer and I need to improve the efficiency of the engine.

  • MonteCarlo path are created by a third party library (in c++)
  • Pricing is done in IronPython (script created by the end user)
  • Everything else is driven by a c# application

the pricing process is as follow:

  • C# application request the path and collect them
  • C# application push the paths to the script, who price and return the values
  • C# application display the result to the end user

The number and size of the paths collected are know in advance.

I have 2 solutions with some advantages and drawback:

  1. Request path generation, for each path, ask the script to return the result and finaaly aggregate the results once all paths are processed
  2. Request path generation, collect all of them, request the script to process all of them at once and retrun me the final price

The first solutions work fine in all scenarios but as the number of path requested increase the performance decrease (I think it's due to the multiple call to ironpython)

The second solution is faster but can hit an "out of memory" exception (I think it's not enough virtual memory addressing space) if the number of path requested is too large

I choose the middle ground and process a bunch of path then aggregate the prices. What I want now is to increase the performance futher by knowing in advance how many path I can process withou hitting the "out of memory" exception

I did the math and I know in advance the size (in memory) of path for a given request. However because I'm quiet sure it's not a memory problem but more virtual memory addressing issue


So all this text is summarize by the following 2 questions:

  1. Is it possible to know in advance how much virtual memory address my process wil need to store n instance of a class (size in memory and structure are known)?
  2. Is it possible to know how much virtual memory address are still available for my process

btw I'm working on the 32 bit computer

Thanks in advance for the help

Guillaume
  • 1,176
  • 2
  • 11
  • 27
  • Producer/Consumer... Threading ? – Guillaume Nov 24 '11 at 13:29
  • Any reason not to just get a 64 bit computer and put in more than the 4gb ram maximum? Siovled problem and gives you better scalability anyway. – TomTom Nov 24 '11 at 13:31
  • @Tomtom: can't do that, I don't have the control of the target environement. it could be 64bit or 32bit – Guillaume Nov 24 '11 at 13:48
  • THen you CAN also set LAA flag on it. Gives you 3gb instead of 2gb. – TomTom Nov 24 '11 at 14:03
  • @TomTom: Nice try, but frankly if go to see the head of IT infrastucture and ask him to change standard build he provide to the users, I think he will laugh at me – Guillaume Nov 24 '11 at 14:11
  • THEN it is time to look for another job. Really. We had the same fight in a project, now we moved the servers to 64 bit and got an "unapprooved" database for our project. 32 bit standard build for large data manipulation is "well, I rather work for a company not stuck in 1900". – TomTom Nov 24 '11 at 14:21

2 Answers2

1

Regarding question 1: How much memory does a C#/.NET object use?

Regarding question 2: you could use a memory performance counter

Community
  • 1
  • 1
Emond
  • 50,210
  • 11
  • 84
  • 115
1

Finding out how much memory an object takes in .NET is a pretty difficult task. I've hit the same problem several times. There are some imperfect methods, but none are very precise.

My suggestion is to get some estimate of how much a path will take, and then pass a bunch of them leaving a good margin of safety. Even if you're processing them just 10 at a time, you've reduced the overhead 10 times already.

You can even make the margin configurable and then tweak it until you strike a good balance. An even more elegant solution would be to run the whole thing in another process and if it hits an OutOfMemoryException, restart the calculation with less items (and adjust the margin accordingly). However, if you have so much data that it runs out of memory, then it might be a bit slow to pass it across two processes (which will also duplicate the data).

Could it be that the memory overflow is because of some imperfections in the path processor? Memory leaks maybe? Those are possible both in C++ and .NET.

Vilx-
  • 104,512
  • 87
  • 279
  • 422
  • For the other process solution is not an option i'm talking about 100,000 paths. doing it anthore process to find the balance is to slow. – Guillaume Nov 24 '11 at 13:51
  • Finding the right balance by extrapolating the number of paths is what I did, but the problem is that complexity of a path object depend of the user request (not know in advance) but know at runtime – Guillaume Nov 24 '11 at 13:54
  • What exactly is the slow part? The overhead of calling the script? Or is the script actually capable or processing multiple paths in parallel? If it's just calling overhead, maybe you can then modify the script container so that it processes the paths one by one and thus basically reduces to the first option (in your question) without the overhead of multiple calls? – Vilx- Nov 24 '11 at 14:02
  • the overhead is cause by c# calling the script(c# calling Ironpytonh) , the script process the path sequentially but the problem is the number of time I need to call it. If I group the path by 10 for processing 100000 path I will call the script 10000 if I group the path by 100000 I will call the "script" (ironpython) once therefore reducing the overhead by 10000. but Nevermind you help me find a solution – Guillaume Nov 24 '11 at 15:07
  • I did my test in extreme condition which are not realistic. With realistic value the overhead is not so important. I will find the optimal grouping value for a standard path and ajust it on the fly according path requested by the user – Guillaume Nov 24 '11 at 15:12
  • Well, if the script processes them sequentially, then there must be a memory (or in case of .NET - reference) leak somewhere in there, otherwise it wouldn't run out of memory. Maybe investigate that avenue? – Vilx- Nov 24 '11 at 15:29