OOP much slower than Structural programming when simulating annealing in C#. Why and how can it be fixed?

Question

When I was writing a simulated annealing program with OOP in C#, I found that OOP is slower than Structural Programming(spaghetti code).

I observed this when I removed one class, which was being instantiated in every iteration, and wrote it in a structural form. Suddenly it got much faster .

I also checked it with Tabu Search. Same result .

Could anyone please tell me why this is happening and how I can fix it for any future OOP programs that I write?

Are there any tricks that could help such as for example caching my classes or something like that?

The loss in performance depends on the usage of classes. For example, creating many short-lives objects may slow things down. Simply calling methods on the other hand is relatively cheap in C#. With some code it would be easier to point out possible pitfalls. Additionally, you may use a profiler to determine the most expensive lines. — DasKrümelmonster, Sep 27 '13 at 13:50

score 2 · Accepted Answer · answered Sep 27 '13 at 19:48

2

If you have a high-frequency loop, and inside that loop you create new objects and don't call other functions very much, then, yes, you will see that if you can avoid those news, say by re-using one copy of the object, you can save a large fraction of total time.

Between new, constructors, destructors, and garbage collection, a very little code can waste a whole lot of time. Use them sparingly.

answered Sep 27 '13 at 19:48

Mike Dunlavey

40,059
14
91
135

Thanks . is there anyway that i can measure how much time spend on creating object and reaching memory time for accessing old objects? – Alireza Sep 28 '13 at 12:00
@user2823226: Well, I use [*this method*](http://stackoverflow.com/a/378024/23771) for general performance tuning, regardless of language. The fraction of samples that have `new` on the stack tells me roughly what fraction of time is spent allocating objects. For me, "roughly" is good enough. If you re-use old objects, the time spent accessing them is two to three orders of magnitude faster than re-creating them. – Mike Dunlavey Sep 28 '13 at 14:28

Zorf · Answer 2 · 2021-05-03T20:06:25.540

Memory access is often overlooked. The way o.o. tends to lay out data in memory is not conducive to efficient memory access in practice in loops. Consider the following pseudocode:

adult_clients = 0
for client in list_of_all_clients:
  if client.age >= AGE_OF_MAJORITY:
    adult_clients++

It so happens that the way this is accessed from memory is quite inefficient on modern architectures because they like accessing large contiguous rows of memory, but we only care for client.age, and of all clients we have; those will not be laid out in contiguous memory.

Focusing on objects that have fields results into data being laid out in memory in such a way that fields that hold the same type of information will not be laid out in consecutive memory. Performance-heavy code tends to involve loops that often look at data with the same conceptual meaning. It is conducive to performance that such data be laid out in contiguous memory.

Consider these two examples in Rust:

// struct that contains an id, and an optiona value of whether the id is divisible by three
struct Foo {
    id         : u32,
    divbythree : Option<bool>,
}

fn main () {
  // create a pretty big vector of these structs with increasing ids, and divbythree initialized as None
    let mut vec_of_foos : Vec<Foo> = (0..100000000).map(|i| Foo{ id : i, divbythree : None }).collect();
    
    // loop over all hese vectors, determine if the id is divisible by three
    // and set divbythree accordingly
    let mut divbythrees = 0;
    for foo in vec_of_foos.iter_mut() {
        if foo.id % 3 == 0 {
            foo.divbythree = Some(true);
            divbythrees += 1;
        } else {
            foo.divbythree = Some(false);
        }
    }
    // print the number of times it was divisible by three
    println!("{}", divbythrees);
}

On my system, the real time with rustc -O is 0m0.436s; now let us consider this example:

fn main () {
    // this time we create two vectors rather than a vector of structs
    let vec_of_ids             : Vec<u32>          = (0..100000000).collect();
    let mut vec_of_divbythrees : Vec<Option<bool>> = vec![None; vec_of_ids.len()];
    
    // but we basically do the same thing
    let mut divbythrees = 0;
    for i in 0..vec_of_ids.len(){
        if vec_of_ids[i] % 3 == 0 {
            vec_of_divbythrees[i] = Some(true);
            divbythrees += 1;
        } else {
            vec_of_divbythrees[i] = Some(false);
        }
    }
    println!("{}", divbythrees);
}

This runs in 0m0.254s on the same optimization level, — close to half the time needed.

Despite having to allocate two vectors instead of of one, storing similar values in contiguous memory has almost halved the execution time. Though obviously the o.o. approach provides for much nicer and more maintainable code.

P.s.: it occurs to me that I should probably explain why this matters so much given that the code itself in both cases still indexes memory one field at a time, rather than, say, putting a large swath on the stack. The reason is c.p.u. caches: when the program asks for the memory at a certain address, it actually obtains, and caches, a significant chunk of memory around that address, and if memory next to it be asked quickly again, then it can serve it from the cache, rather than from actual physical working memory. Of course, compilers will also vectorize the bottom code more efficiently as a consequence.

OOP much slower than Structural programming when simulating annealing in C#. Why and how can it be fixed?

2 Answers2