D opEquals() and opCast() give me a Segmentation Fault

Question

I am trying to write a library of types representing a Global Trade Identification Number (GTIN) in the D Programming Language. There are four kinds of GTIN's, each named after the length of the number itself: GTIN8, GTIN12, GTIN13, and GTIN14. Each one of these is modeled as a class inheriting from an abstract GTIN class.

In the abstract GTIN class, I am overriding the opEquals operator to compare two GTINs of any kind. Since all of the smaller GTINs can be mapped to a broader GTIN14, each GTIN in the comparison gets cast to a GTIN14 first, then compared.

This all works fine until I compare one or more GTIN13s (or presumably, a smaller GTIN type as well), in which case, I get a segmentation fault.

It all begins with this unit test:

GTIN13 gtin1 = new GTIN13("123456789012");
GTIN13 gtin2 = new GTIN13("123456789012");
assert(gtin1 == gtin2);

The operator opEquals() has a signature of Object.opEquals(Object o), and changing the signature to anything else does not override ==. In my case, this becomes a call to gtin1.opEquals(cast(Object) gtin2). My opEquals override looks like this (subtracting the accumulated commented-out code and debug statements):

public override @trusted
bool opEquals(Object other)
{
    GTIN14 a = cast(GTIN14) this;
    GTIN14 b = cast(GTIN14) other;

    for (int i; i < a.digits.length; i++)
    {
        if (a.digits[i] != b.digits[i]) return false;
    }
    return true;
}

As you can see, each GTIN is cast to a GTIN14 at the start of the function; however, other is first cast to an Object.

public
GTIN14 opCast(GTIN14)()
{
    string currentGTINString = this.toString()[0 .. $-1];
    while (currentGTINString.length < 13)
        currentGTINString = ('0' ~ currentGTINString); 
    return new GTIN14(currentGTINString);
}

Further, my friend writeln() tells me that after the line GTIN14 b = cast(GTIN14) other; is executed, b is null. (It is not null before that line.)

So, in summary, the problem seems to be that casting a GTIN of any kind other than GTIN14 to an Object, then back to GTIN14 somehow deletes the object altogether. Is this a bug? Is this a problem with my code? Is there a workaround that does not compromise code quality?

I will appreciate any help I can get.

You don't have to use `override` to overload operators, `opEquals` included. Just make your `opEquals` `bool opEquals(GTIN other)`. — Michail, Jul 25 '17 at 07:59
I suggest you do not use `object`. Instead use a superclass (base class, or an interface) GTIN, and all your GTIN subtypes inherit from there. If you do use object, then I suggest you use the `is` to check what type of object it is. With your approach you can even pass an object of type RacingCar and that does not seem like something you want to do... — DejanLekic, Jul 26 '17 at 08:05
I hate to say it, but your code is wrong on several levels.... can you post the whole thing? I'll write up some details and fix suggestions from the whole thing if you can post it. — Adam D. Ruppe, Jul 26 '17 at 17:17
@Michail, I'll give that a try after hearing what Adam has to say. I could have sworn that I tried this with no success, but I'll keep it in mind. Thank you. — Jonathan Wilbur, Jul 27 '17 at 00:51
@DejanLekic, I am using a GTIN superclass, and most of everything relating to GTINs is implemented in it. — Jonathan Wilbur, Jul 27 '17 at 00:53
@AdamD.Ruppe, I don't want to spam SO with 560ish LOC, so I'll post a sample that will still give you the "gist." — Jonathan Wilbur, Jul 27 '17 at 00:54
Alright, so any way I tried paring down my code still exceeded SO's character limit for comments, so here's a [Pastebin](https://pastebin.com/78W5XSea) instead, @AdamD.Ruppe. I look forward to improving. — Jonathan Wilbur, Jul 27 '17 at 01:03

score 2 · Accepted Answer · answered Jul 27 '17 at 03:14

All right, I have several comments from your pastebin.

public abstract
class GlobalTradeItemNumber
{
    // this could prolly just return _digits.length too.
    public abstract @property
    size_t length();

My general comment here is your code could be a bit simplified by making the length be borrowed from the array.... or you could also do something like a templated class with the length as a parameter.

I'll come back to that later though, first, let's fix your code. Moving on in the class:

public
/*
    Note #1:

    OK, this line is wrong:
    //GTIN14 opCast(GTIN14)()

    It should probably be:
*/
GTIN14 opCast(T : GTIN14)()
{
    string currentGTINString = this.toString()[0 .. $-1];
    while (currentGTINString.length < 13)
        currentGTINString = ('0' ~ currentGTINString);
    return new GTIN14(currentGTINString);
}

// we also need one for converting to the other sizes
// this could also be done with metaprogramming
GTIN13 opCast(T : GTIN13)()
{
    string currentGTINString = this.toString()[0 .. $-1];
    while (currentGTINString.length < 12)
        currentGTINString = ('0' ~ currentGTINString);
    return new GTIN13(currentGTINString);
}

The reason that line is wrong is that it actually defines a local template argument called GTIN14 which could be anything, and shadows the outer name!

So, my change here makes a new name T, which is then specialized to only match on the GTIN14 name from outside. See: http://dlang.org/spec/template.html#parameters_specialization

But then, it only applies to GTIN14, so I also added a second function that goes GTIN13 too.

If you were using a templated class, the specialization could also extract the length and you'd have one function that does them all. Or, if length as a compile-time constant (an enum) in each child class, you could pull it from there too.

Regardless, fixing your current code can be done with just the specialization syntax (T : class_name) and adding functions for the other subclasses too.

public
ulong toNumber()
{
    ulong result;
    // a note here: I would probably just have
    // int exponent = 1;
    // then exponent *= 10; in each loop instead of pow each time.
    for (size_t i = this.length-1; i > 0; i--)
    {
        result += (this._digits[i] * (10^^(this._digits.length-i)));
    }
    return result;
}

Well, not much to say there, your code works, I would just write it a bit differently. The accumulator variable pattern would optimize it a wee bit. Of course, not really important here, just something to keep in mind.

/*
    Note #2: this is where things get broken
*/
public override @trusted
bool opEquals(Object other)
{
    /*
    these are actually two different kinds of casts
    GTIN14 b = cast(GTIN14) other; // does a generic dynamic cast!
    GTIN14 a = cast(GTIN14) this; // actually calls the opCast
    */

    GTIN obj = cast(GTIN) other;
    if(obj is null) // which might return null because Object is not necessarily an instance of your class
        return false; // definitely not a match

    GTIN14 b = cast(GTIN14) obj;
    GTIN14 a = cast(GTIN14) this;

    for (int i; i < a.digits.length; i++)
    {
        if (a.digits[i] != b.digits[i]) return false;
    }
    return true;
}

public override @trusted
int opCmp(Object other)
{
    // GTIN14 that = cast(GTIN14) other; // a generic dynamic cast!
    GTIN obj = cast(GTIN) other;
    if(obj is null) // which might return null because Object is not necessarily an instance of your class
        return -1; // so return something indicating not a match
    GTIN14 that = cast(GTIN14) obj; // now you can use your custom cast
    const ulong thisNumber = this.toNumber();
    const ulong thatNumber = that.toNumber();
    if (thisNumber == thatNumber) return 0;
    return ((thisNumber / 10u) > (thatNumber / 10u) ? 1 : -1);
}

The other opCmps in the child classes make the same mistake and can be fixed the same way.

But this is the main thing causing your trouble - the same cast syntax actually does two different things!

See, the static type of other is the generic base class Object, so it isn't aware of your custom cast function yet. (opCast, being a template, cannot be virtual and thus is not override like other functions that modify the behavior of generic function)

Instead, it does a generic dynamic_cast (that's the C++ name, D just calls them all cast, but it is the same concept so you can read more about it by searching for the C++ term if you like). It tries to convert the base class/interface back to the subclass using a runtime type tag. If it is indeed an instance of that class (or one of its own subclasses), the cast succeeds and you get the reference. Otherwise, it returns null. This is the cause of your segfault.

On the other hand, the cast(xxx) this already knows it is an instance of GTIN (or one of its subclasses), so it is able to use your custom opCast conversion. So, your a variable called the right function and got populated correctly, but your b variable would be null... unless you happened to actually be comparing two GTIN14 instances. Then the dynamic cast would succeed, but not for the other classes.

So, the fix is to first cast that generic Object other back to your GTIN base class, check for null (this would happen if a user wrote like GTIN14 a = new GTIN14("xxx"); Object b = new Object(); assert(a == b); /* uh oh, b is an Object, so it should just return null */.

BTW, when comparing against null, you should usually use a is null instead of a == b because if a itself is null, it will crash when trying to access the virtual opEquals function!

Anyway, after you cast it back to GTIN, then you can cast again and invoke your conversion function.

Alternatively, you might also use a different named function like perhaps toGTIN14 in the base class that generically does the conversion, and you just call that from each instance of the base class and convert them that way instead of using the cast keyword. That'd actually be the way I'd write it - it is my preference, but both ways work.

Both opEquals and opCmp, from any classes where they are implemented, need to follow this same pattern. In opEquals you can see I return false when it is null, since they obviously aren't equal if they can't even be converted to a common type!

But in opCmp, you don't want to return 0 since that means equal, but what to actually return is a mystery to me.... I just did -1 so all the other objects in that array would be sorted earlier but maybe you have a better idea. I don't know what's best.

Anyway yeah, doing those changes should fix your code.

Lastly, as a bonus, here's an implementation of the generic templated class:

alias GTIN14 = GlobalTradeItemNumberImpl!14;
alias GTIN13 = GlobalTradeItemNumberImpl!13;
public
class GlobalTradeItemNumberImpl(int size) : GlobalTradeItemNumber
{
    public override @property
    size_t length()
    {
        return size;
    }

    this(string digits)
    {
        super(digits);
    }
}

If you've ever looked at some of Phobos' innards, you will see patterns similar to this in std.base64 and std.digest.

With that, all the functionality is actually in the base class now. You can rewrite the opCast as thus in the base class:

T opCast(T : GlobalTradeItemNumberImpl!N, int N)()
{
    string currentGTINString = this.toString()[0 .. $-1];
    while (currentGTINString.length < (N-1))
        currentGTINString = ('0' ~ currentGTINString);
    return new T(currentGTINString);
}

The specialization there uses the "pattern matching" described in form #7 of this: http://dlang.org/spec/expression.html#IsExpression to catch any random N, and extract what it is for use inside the function.

There's other optimizations we could do too if you're interested, like the use of the ~ operator, or it could even be changed from classes to struct using alias this to combine common functionality.. but I'll let you play with that if you want to :)

Oh wow! I don't know how I've been programming this long and never learned about different *kinds* of casts! And for the other stuff, I agree with everything you said. I tried to make a templated version, but I had some kind of problem with that, but it looks like you figured it out! Spectacular points and explanation. Thank you so much for your help, Adam! — Jonathan Wilbur, Jul 27 '17 at 14:31
Yeah, the casts are kinda confusing. I would suggest reading a bit on the C++ versions: https://stackoverflow.com/questions/28002/regular-cast-vs-static-cast-vs-dynamic-cast In D, Walter wanted to simplify the C++ down to just one type... but what really happened is now we have all the same types of casts, just all sharing the same keyword. `const_cast` is `cast()` or `cast(const)` etc. `dynamic_cast` is `cast(some_object)` from another object. `reinterpret_cast` is `*cast(T*)cast(void*)&x` (yes, messy lol). The rest are `static_cast`, unless `opCast` or ctors are involved. And I'm out of room — Adam D. Ruppe, Jul 27 '17 at 14:48

D opEquals() and opCast() give me a Segmentation Fault

1 Answers1