D programming without the garbage collector

Question

I've been looking at D today and on the surface it looks quite amazing. I like how it includes many higher level constructs directly in the language so silly hacks or terse methods don't have to be used. One thing that really worries me if the GC. I know this is a big issues and have read many discussions about it.

My own simple tests sprouted from a question here shows that the GC is extremely slow. Over 10 times slower than straight C++ doing the same thing. (obviously the test does not directly convert into real world but the performance hit is extreme and would slow down real world happens that behave similarly(allocating many small objects quickly)

I'm looking into writing a real time low latency audio application and it is possible that the GC will ruin the performance of the application to make it nearly useless. In a sense, if it has any issues it will ruin the real time audio aspect which is much more crucial since, unlike graphics, audio runs at a much higher frame rate(44000+ vs 30-60). (due to it's low latency it is more crucial than a standard audio player which can buffer significant amounts of data)

Disabling the GC improved the results to within about 20% of the C++ code. This is significant. I'll give the code at the end for analysis.

My questions are:

How difficult is it to replace D's GC with a standard smart pointers implementation so that libraries that rely on the GC can still be used. If I remove GC completely I'll lose a lot of grunt work, as D already has limit libraries compared to C++.
Does GC.Disable only halt the garbage collection temporarily(preventing the GC thread from running) and GC.Enable pick back up where it left off. So I could potentially disable the GC from running in high cpu usage moments to prevent latency issues.
Is there any way to enforce a pattern to not use GC consistently. (this is because I've not programming in D and when I start writing my glasses that do not use the GC I would like to be sure I don't forget to implement their own clean up.
Is it possible to replace the GC in D easily? (not that I want to but it might be fun to play around with different methods of GC one day... this is similar to 1 I suppose)

What I'd like to do is trade memory for speed. I do not need the GC to run every few seconds. In fact, if I can properly implement my own memory management for my data structures then chances are it will not need to run very often at all. I might need to run it only when memory becomes scarce. From what I've read, though, the longer you wait to call it the slower it will be. Since there generally will be times in my application where I can get away with calling it without issues this will help alleviate some of the pressure(but then again, there might be hours when I won't be able to call it).

I am not worried about memory constraints as much. I'd prefer to "waste" memory over speed(up to a point, of course). First and foremost is the latency issues.

From what I've read, I can, at the very least, go the route of C/C++ as long as I don't use any libraries or language constructs that rely on the GC. The problem is, I do not know the ones that do. I've seen string, new, etc mentioned but does that mean I can't use the build in strings if I don't enable the GC?

I've read in some bug reports that the GC might be really buggy and that could explain its performance problems?

Also, D uses a bit more memory, in fact, D runs out of memory before the C++ program. I guess it is about 15% more or so in this case. I suppose that is for the GC.

I realize the following code is not representative of your average program but what it says is that when programs are instantiating a lot of objects(say, at startup) they will be much slower(10 times is a large factor). Of the GC could be "paused" at startup then it wouldn't necessarily be an issue.

What would really be nice is if I could somehow have the compiler automatically GC a local object if I do not specifically deallocate it. This almost give the best of both worlds.

e.g.,

{
    Foo f = new Foo();
    ....
    dispose f; // Causes f to be disposed of immediately and treats f outside the GC
               // If left out then f is passed to the GC.
               // I suppose this might actually end up creating two kinds of Foo 
               // behind the scenes. 

    Foo g = new manualGC!Foo();   // Maybe something like this will keep GC's hands off 
                                  // g and allow it to be manually disposed of.
}

In fact, it might be nice to actually be able to associate different types of GC's with different types of data with each GC being completely self contained. This way I could tailor the performance of the GC to my types.

Code:

module main;
import std.stdio, std.conv, core.memory;
import core.stdc.time;

class Foo{
    int x;
    this(int _x){x=_x;}
}

void main(string args[]) 
{

    clock_t start, end;
    double cpu_time_used;


    //GC.disable();
    start = clock();

    //int n = to!int(args[1]);
    int n = 10000000;
    Foo[] m = new Foo[n];

    foreach(i; 0..n)
    //for(int i = 0; i<n; i++)
    {
        m[i] = new Foo(i);
    }

    end = clock();
    cpu_time_used = (end - start);
    cpu_time_used = cpu_time_used / 1000.0;
    writeln(cpu_time_used);
    getchar();
}

C++ code

#include <cstdlib>
#include <iostream>
#include <time.h>
#include <math.h>
#include <stdio.h>

using namespace std;
class Foo{
public:
    int x;
    Foo(int _x);

};

Foo::Foo(int _x){
    x = _x;
}

int main(int argc, char** argv) {

    int n = 120000000;
    clock_t start, end;
    double cpu_time_used;




    start = clock();

    Foo** gx = new Foo*[n];
    for(int i=0;i<n;i++){
        gx[i] = new Foo(i);
    }


    end = clock();
    cpu_time_used = (end - start);
    cpu_time_used = cpu_time_used / 1000.0;
    cout << cpu_time_used;

    std::cin.get();
    return 0;
}

@AbstractDissonance Surely some things have changed in the meantime (e.g. the performance of the GC). Nevertheless, you're measuring different things. You do not free the allocated memory of the created objects in your C++ benchmark example. Thereby, the time it takes to free the memory is not measured at all in C++. However, the garbage collector in your D example should run at least once if not more often. It might not free all memory before you stop the time. You get more or less comparable times in my opinion if you free the memory in C++ and call the GC explicitly after the loop in D. — maiermic, Oct 29 '17 at 08:46

score 19 · Answer 1 · edited Feb 19 '15 at 03:33

19

D can use pretty much any C library, just define the functions needed. D can also use C++ libraries, but D does not understand certain C++ constructs. So... D can use almost as many libraries as C++. They just aren't native D libs.
From D's Library reference.
Core.memory:
```
static nothrow void disable();
```
Disables automatic garbage collections performed to minimize the process footprint. Collections may continue to occur in instances where the implementation deems necessary for correct program behavior, such as during an out of memory condition. This function is reentrant, but enable must be called once for each call to disable.
```
static pure nothrow void free(void* p);
```
Deallocates the memory referenced by p. If p is null, no action occurs. If p references memory not originally allocated by this garbage collector, or if it points to the interior of a memory block, no action will be taken. The block will not be finalized regardless of whether the FINALIZE attribute is set. If finalization is desired, use delete instead.
```
static pure nothrow void* malloc(size_t sz, uint ba = 0);
```
Requests an aligned block of managed memory from the garbage collector. This memory may be deleted at will with a call to free, or it may be discarded and cleaned up automatically during a collection run. If allocation fails, this function will call onOutOfMemory which is expected to throw an OutOfMemoryError.
So yes. Read more here: http://dlang.org/garbage.html

And here: http://dlang.org/memory.html

If you really need classes, look at this: http://dlang.org/memory.html#newdelete delete has been deprecated, but I believe you can still free() it.
Don't use classes, use structs. Structs are stack allocated, classes are heap. Unless you need polymorphism or other things classes support, they are overhead for what you are doing. You can use malloc and free if you want to.
More or less... fill out the function definitions here: https://github.com/D-Programming-Language/druntime/blob/master/src/gcstub/gc.d . There's a GC proxy system set up to allow you to customize the GC. So it's not like it is something that the designers do not want you to do.

Little GC knowledge here: The garbage collector is not guaranteed to run the destructor for all unreferenced objects. Furthermore, the order in which the garbage collector calls destructors for unreference objects is not specified. This means that when the garbage collector calls a destructor for an object of a class that has members that are references to garbage collected objects, those references may no longer be valid. This means that destructors cannot reference sub objects. This rule does not apply to auto objects or objects deleted with the DeleteExpression, as the destructor is not being run by the garbage collector, meaning all references are valid.

import std.c.stdlib; that should have malloc and free.

import core.memory; this has GC.malloc, GC.free, GC.addroots, //add external memory to GC...

strings require the GC because they are dynamic arrays of immutable chars. ( immutable(char)[] ) Dynamic arrays require GC, static do not.

If you want manual management, go ahead.

import std.c.stdlib;
import core.memory;

char* one = cast(char*) GC.malloc(char.sizeof * 8);.
GC.free(one);//pardon me, I'm not used to manual memory management. 
//I am *asking* you to edit this to fix it, if it needs it.

why create a wrapper class for an int? you are doing nothing more than slowing things down and wasting memory.

class Foo { int n; this(int _n){ n = _n; } }
writeln(Foo.sizeof);  //it's 8 bytes, btw
writeln(int.sizeof);  //Its *half* the size of Foo; 4 bytes.


Foo[] m;// = new Foo[n]; //8 sec
m.length=n; //7 sec minor optimization. at least on my machine.
foreach(i; 0..n)
    m[i] = new Foo(i);


int[] m;
m.length=n; //nice formatting. and default initialized to 0
//Ooops! forgot this...
foreach(i; 0..n)
    m[i] = i;//.145 sec

If you really need to, then write the Time-sensitive function in C, and call it from D. Heck, if time is really that big of a deal, use D's inline assembly to optimize everything.

edited Feb 19 '15 at 03:33

Colonel Thirty Two

23,953
8
45
85

answered Nov 27 '12 at 00:41

0b1100110

383
1
7

For the love of all that is holy someone correct me if I'm wrong. Memory is not my forte. – 0b1100110 Nov 27 '12 at 00:42
5

From personal experience - if people just think about writing GC-friendly code, 90% of all performance complains would go away... – DejanLekic Nov 27 '12 at 00:45
I don't see how the wrapper around the int is wasting memory. Can you explain ? – deadalnix Nov 27 '12 at 03:45
I was under the impression that an Object contains more than just the sum of what it's wrapping. dlang.org/abi.html – 0b1100110 Nov 27 '12 at 03:56
writeln(__traits(classInstanceSize, Foo)); //20bytes. A plain int is 4 bytes. Unless that doesn't measure what I think it does. – 0b1100110 Nov 27 '12 at 04:18
Every class instance has size equal to 16 bytes on 64bit architectures or 8bytes on 32bit architectures, or in other words, it contains at least two pointers - pointer to vtable and pointer to the monitor. Thus I assume 0b1100110 is on 64bit architecture. :) – DejanLekic Nov 27 '12 at 09:13
@0b1100110 The point of the test code is not simply to wrap an int but to check the performance of the GC. The same thing happens in the C++ code(wrapping an int). I think you're not understanding my questions too well. It is solely related to the GC and how it affects performance. Since certain constructs in D rely on the GC they cannot be used. My goal is to use as much as D as I can as it stands but make my application as performant as possible. Because D's GC performs rather poor compared to most other modern GC's that I know of it seems like a huge draw back for my specific application. – AbstractDissonance Nov 27 '12 at 10:24
1

But I do not want to start from scratch, I do not want to throw away strings, array slices, etc as these are useful and what make D more useful over C. I simply want to avoid the "Oh shit, D's GC prevent my application from being useful at all. (in my application, if there are "pops" and "clicks" due to audio drop out then the application is useless). Now, a factor of 1.34 or 2.1 might be ok BUT a factor of 10 is downright scary. You may say that it won't be 10 in the real world, that might be true. But it proves that D's GC has performance problems. – AbstractDissonance Nov 27 '12 at 10:29
So, I want to avoid the potential for my application being useless in D(as it will be at on of work just to get to the point to find out, which means a ton of wasted work because it might not be rectifiable). Somewhere I think I saw some basic implementation of a GC for D that essentially uses smart pointers. In any case, I think the real issue of D's GC is that it blocks all threads. I think it should be possible for it to do it's analysis on a non-blocking background thread(for multi-core) then block only for a minimal time. – AbstractDissonance Nov 27 '12 at 10:33
@DejanLekic So, what about the other 10%? What if your application is real time? Then what? It's not about writing GC friendly code if the GC itself is "broke". Not all GC's are created equal and I think D's GC has much room for improvement(by simply comparing it's performance to others). My main thing is that I don't want to program myself in a corner with this application as if I use D's GC and it simply ruins the application. The D site even mentions this and suggests bypassing the GC in these types of applications. I just don't want to have to start from scratch. – AbstractDissonance Nov 27 '12 at 10:36
@0b1100110 Also, what is the point of using D and C/C++/Asm for critical parts IF D's GC slows them down? It will block them, it might finish quicker due less work it has to do on the managed resources but it will still block. And, then, why not just use D completely and manage my own resources on my own code? It would not be much of a difference. – AbstractDissonance Nov 27 '12 at 10:41
@DejanLekic Yes, 64bit. I figured if we are talking about GC performance, size will always matter. Shrug, I honestly dunno how much difference it would make, if it would make any difference at all. – 0b1100110 Nov 27 '12 at 14:28
@AbstractDissonance Ok, I figured you had a reason for using classes there, I just wanted to make sure you understood the reasons why they wouldn't be optimal. The reason to use C/C++/Asm would be to GC.disable() and simply not allow the GC to collect. IIUC, the collection is the part that's terrible for performance. So my suggestion was to simply not use it. I don't know what your exact requirements are, but if there is no point in the lifetime of the app that a GC collection could be tolerated(such as when sound is paused), then D's GC would be a limiting factor. – 0b1100110 Nov 27 '12 at 14:38
@0b1100110: The sound may not be paused for potentially hours(unlikely but possible) and if GC is disabled then it would probably be equivalent to a memory leak(eventually run out of memory). Initiating the GC at some point would probably result in a large unacceptable delay(again, cause it blocks all threads). It seems, if you want real time behavior either you have to accept the GC or *completely* throw it out, at least in the current implementation. I'm still investigating if it will be a serious problem or not but at least people are working on the problem. – AbstractDissonance Nov 27 '12 at 15:23
@AbstractDissonance, My experience has been that with the GC disabled, memory usage reaches a peak and stays there.(Over a few hours, doing very little in the meantime. mainly waiting.) I probably just got lucky. http://dlang.org/phobos/std_typecons.html#RefCounted http://dlang.org/phobos/std_typecons.html#scoped, those might be useful. We've gone way past the boundaries of my knowledge on this subject... I hope you find the answers your looking for. – 0b1100110 Nov 27 '12 at 15:36
@0b1100110 yeah, I've seen them and D does look like it can ultimate do the work. I just really want to make sure I don't waste a tone of time trying to write an application that simply can't be done with "standard D". I guess ultimate I'm just going to have to create some real test code and see how it turns out. Anyways, thanks for the help... – AbstractDissonance Nov 27 '12 at 15:53
@AbstractDissonance - check the presentation LMAX guys (authors of the Disruptor) had on InfoQ. They explain how they do millions of transactions per second at lowest possible latency, in JAVA, a GC language! I would not use D if my app belong to those 10%. It would be a pure masochism in my opinion. I would use C. – DejanLekic Nov 27 '12 at 16:11
@DejanLekic First, we are not talking about java, I suggest you implement the same test case in java and see if it is 10 times slower than the C++ case. Second, we are talking about audio which is very different from games or financials. Audio cannot pause. It is extremely noticeable. It creates a pop, usually somewhat loud. So, while, on average, you might be able to do the processing without issue, WHEN the GC kicks in, you are screwed, you will get a pop! (unless it is fast and most likely multi-threaded). Even if it only delayed but 1ms... Now, this will happen every time the GC kicks in. – AbstractDissonance Nov 27 '12 at 16:29
@DejanLekic I know you are a fanboy of GC's, and I like GC's too. I do not have a problem with them when they are appropriate. But you should not believe that just because they are good for general purpose applications that they are always good. Also, please don't think that all GC's are perfect. They are not. D's GC is far from it. Just follow the links given in some of the other answers. – AbstractDissonance Nov 27 '12 at 16:30
@AbstractDissonance I do not need anyone to tell me that D's GC is not ideal. I know about these things myself. I still stay behind what I said - use C if your project is among those 10% cases... PS. read my comment again - I did not say you use Java. I also did not advocate using the GC. – DejanLekic Nov 27 '12 at 16:43
@DejanLekic Um, So, you are telling me that D is not capable of doing what I would like to do? Are you sure? I'm pretty sure that it is possible to achieve near the performance of C using D. So it's not a matter of not using D because it can't do it but **How much work will it take**. If I have to write a new standard library and GC to get it done then I'd just use C++ from the start. The only reason I would like to use D in the first place is because it looks like a very nice language, not because I have to use it, but because I want to(but go down a dead end either). – AbstractDissonance Nov 27 '12 at 16:47
@DejanLekic So, telling me that GC are good or bad or 90% of this and 10% of that, Java's GC can do this and that is irrelevant. It does not in any way solve my specific problem with D's GC. – AbstractDissonance Nov 27 '12 at 16:50
@AbstractDissonance - D is capable of doing what you want to do, but **at what cost**? In my world time = money. You get the point? – DejanLekic Nov 27 '12 at 16:50

score 9 · Answer 2 · answered Nov 27 '12 at 00:38

9

I suggest you read this article: http://3d.benjamin-thaut.de/?p=20 There you will find a version of the standard library that does own memory management and completely avoids garbage collection.

answered Nov 27 '12 at 00:38

DejanLekic

18,787
4
46
77

Thanks, I actually ran across this once and this was my first idea of what D could do. It's nice to know someone has done some work in this area and what kind of results one would get(basically a factor 3 in his "real world" application, although mine is more time critical due to the latency issues). His library, from what I've read, is not complete. I'll look it over and see where it takes me. – AbstractDissonance Nov 27 '12 at 10:36

score 5 · Answer 3 · answered Nov 27 '12 at 11:23

D's GC simply isn't as sophisticated as others like Java's. It's open-source so anyone can try to improve it.

There is an experimental concurrent GC named CDGC and there is a current GSoC project to remove the global lock: http://www.google-melange.com/gsoc/project/google/gsoc2012/avtuunainen/17001

Make sure to use LDC or GDC for compilation to get better optimized code.

The XomB project also uses a custom runtime but it's D version 1 I think. http://wiki.xomb.org/index.php?title=Main_Page

score 4 · Answer 4 · answered Dec 10 '12 at 11:06

4

You can also just allocate all memory blocks you need then use a memory pool to get blocks without the GC.

And by the way, it’s not as slow as you mentionned. And GC.disable() doesn’t really disable it.

answered Dec 10 '12 at 11:06

phaazon

1,972
15
21

Object pool is a pattern usually seen in videogames which can greatly help in audio applications too. I'd create a series of pools, one per type of info you need (sample chunks, compressed data section, etc.), and always request from these and return control to them when no longer needed. Thus, you limit memory fragmentation (not relevant for you), but also ensure that none of your small instances need collection, since they are always contained inside a struct. – Elideb Dec 11 '12 at 00:41
Another good point is the `reuse()` pattern, which means in reusing an object by calling a ctor on an already created object, preventing any GC free or alloc (it’s not actually a ctor call, but it’s similar). – phaazon Dec 11 '12 at 10:31

score 3 · Answer 5 · answered Dec 05 '12 at 21:44

We might look at the problem from a bit different view. Suboptimal performance of allocating many little objects, which you mention as a rationale for the question, has little to do with GC alone. Rather, it's a matter of balance between general-purpose (but suboptimal) and highly-performant (but task-specialised) memory management tools. The idea is: presence of GC doesn't prevent you from writing a real-time app, you just have to use more specific tools (say, object pools) for special cases.

score 2 · Answer 6 · answered Mar 07 '13 at 16:19

Since this hasn't been closed yet, recent versions of D have the std.container library which contains an Array data structure that is significantly more efficient with respect to memory than the built-in arrays. I can't confirm that the other data structures in the library are also efficient, but it may be worth looking into if you need to be more memory conscious without having to resort to manually creating data structures that don't require garbage collection.

score 0 · Answer 7 · answered Sep 17 '22 at 14:52

D is constantly evolving. Most of the answers here are 9+ years old, so I figured I'd answer these questions again for anyone curious what the current situation is.

(...) replace D's GC with a standard smart pointers implementation so that libraries that rely on the GC can still be used. (...)

Replacing the GC itself with smart pointers is not something I've looked into (i.e. where new creates a smart pointer). There are several D libraries that add smart pointers. You can interface with any C library. Interfacing with C++ and even Objective-C is also supported to some degree, so that should cover you pretty well.

Does GC.disable only halt the garbage collection temporarily (preventing the GC thread from running) and GC.enable pick back up where it left off. (...)

"Collections may continue to occur in instances where the implementation deems necessary for correct program behaviour, such as during an out of memory condition." [source]

So mostly, yes. You can also manually invoke collection during down-time.

Is there any way to enforce a pattern to not use GC consistently. (...) when I start writing my classes that do not use the GC I would like to (...)

Classes are always allocated on the GC and are reference types. Structs should be used instead. However, keep in mind that structs are value types, so by default they're copied when being moved. You can @disable the copy constructor if you don't like this behaviour, but then your struct won't be POD.

What you're probably looking for is @nogc, which is a function attribute that stops a function from using the GC. You can't mark a struct type as @nogc, but you can mark each of its methods as @nogc. Just keep in mind that @nogc code can't call GC code. There's also nothrow.

If you intend to never use GC, you ought to look into Better C. It's a D language setting that removes all of D's runtime, standard library (Phobos), GC and all GC-reliant features (namely associative arrays and exceptions) in favour of using C's runtime and the C Standard Library.

Is it possible to replace the GC in D (...)

Yes it is: https://dlang.org/spec/garbage.html#gc_registry

And you can configure the pre-existing GC to better suit your needs if you don't want to make your own GC.

D programming without the garbage collector

7 Answers7

Linked