8

First of all — sorry for the specifics. I generally try to boil my SO questions to generic "class A" stuff with only relevant stuff, but I'm not sure what's the source of the problem here.

I have a matrix class template that looks like this (only showing what I think are the relevant parts):

template <std::size_t R, std::size_t C>
class Matrix
{
private:
    //const int rows, cols;
    std::array<std::array<float,C>,R> m;
public:
    inline std::array<float,C>& operator[](const int i)
    {
        return m[i];
    }

    const std::array<float,C> operator[](const int i) const
    {
        return m[i];
    }

    template<std::size_t N>
    Matrix<R,N> operator *(const Matrix<C,N> a) const
    {
        Matrix<R,N> result = Matrix<R,N>();
        // irrelevant calculation
        return result;
    }
    // ... other very similar stuff, I'm not sure that it's relevant
}

template <std::size_t S>
Matrix<S,S> identity()
{
    Matrix<S,S> matrix = Matrix<S,S>();

    for(std::size_t x = 0; x < S; x++)
    {
        for(std::size_t y = 0; y < S; y++)
        {
            if (x == y)
            {
                matrix[x][y] = 1.f;
            }
        }
    }

    return matrix;
}

I unit tested the whole class, both multiplication and identity factory seem to be working alright. However, then I use it in this method, which gets called a lot of times (I think that if you ever wrote a renderer, it's pretty obvious what I'm trying to do here):

Vec3i Renderer::world_to_screen_space(Vec3f v)
{
    Matrix<4,1> vm = v2m(v);
    Matrix<4,4> projection = identity<4>(); // If I change this to Matrix<4,4>(), the error doesn't happen
    projection[3][2] = -1.f;
    vm = projection * vm;
    Vec3f r = m2v(vm);
    return Vec3i(
            (r.x + 1.) * (width / 2.),
            (r.y + 1.) * (height / 2.),
            r.z
        );
}

And after some amount of time and some amount of random calls to this method, I get this:

Job 1, 'and ./bin/main' terminated by signal SIGBUS (Misaligned address error)

However, if I change the line identity<4>() to Matrix<4,4>() the error doesn't happen. I'm new to C++, so it must be something really stupid.

So, (1) what does this error mean and (2) how did I manage to shoot myself in the leg?

Update: and of course, this bug won't reproduce in the LLDB debugger.

Update 2: here's what I got after running the program through Valgrind:

==66525== Invalid read of size 4
==66525==    at 0x1000148D5: Renderer::draw_triangle(Vec3<float>, Vec3<float>, Vec3<float>, Vec2<int>, Vec2<int>, Vec2<int>, Model, float) (in ./bin/main)

And draw_triangle is exactly the method that calls world_to_screen_space and uses it's result.

Update 3: I discovered the source of the problem, and it wasn't anything related to this code — and it was something pretty obvious, too. Really not sure what to do about this question now.

Max Yankov
  • 12,551
  • 12
  • 67
  • 135
  • 6
    "Misaligned address" usually means that you have a CPU that requires certain alignment for certain data types (e.g. a 32-bit integer must be at a 32-bit aligned address like 0x1000 or 0x1004), and your code is attempting to violate that requirement (by attempting to read a 32-bit integer from address 0x1001). You are likely using some casting or other forms of type-punning or bad pointer arithmetic in the code you haven't shown... Or possibly you have a corrupted pointer... Or one of several other possible explanations... – twalberg Feb 25 '15 at 20:35
  • If it still happens (without a debugger), allow core dumps, then inspect the dump with a debugger. – user58697 Feb 25 '15 at 20:38
  • @twalberg I don't use pointers at all in the code — only references or copying by value. – Max Yankov Feb 25 '15 at 20:38
  • The issue is likely in your `operator[]` implementation – Collin Dauphinee Feb 25 '15 at 20:52
  • Unless there is more than one thread involved here, I'd guess that your `std::array` isn't aligning it's contents in a way that satisfies the processor. If there is more than one thread, I'm going to go with race condition. – Collin Dauphinee Feb 25 '15 at 20:59
  • @CollinDauphinee there's only one thread. Does it mean that `std::array` is broken? How can it be? – Max Yankov Feb 25 '15 at 21:03
  • What processor are you running this code on? – Siqi Lin Feb 25 '15 at 22:52
  • Is that the entire Valgrind message? I would have expected the data address as well as the code address. – Ben Voigt Feb 25 '15 at 22:55
  • Notice that this is an anti-pattern: `for(std::size_t y = 0; y < S; y++) if (x == y) matrix[x][y] = 1.f;` Instead, just say `matrix[x][x] = 1.0f;` – Ben Voigt Feb 25 '15 at 22:57
  • Perhaps the code for `Renderer::draw_triangle(...)` would help some, given the Valgrind output... – twalberg Feb 25 '15 at 23:41
  • @BenVoigt thanks. This code is the opposite of optimized: I want it to get it correct and covered with tests first and go all over it and replace this brain farts with something faster afterwards – Max Yankov Feb 26 '15 at 10:26
  • 1
    @golergka: Shorter simpler code is *also* easier to get correct. – Ben Voigt Feb 26 '15 at 15:22

1 Answers1

5

Without a processor that checks for misalignment (as @twalberg says), it is impossible to run and validate the code. But I can say this: it is a common bug in the C++ or other libraries to confuse one type of exception with another type.

My guess -- sorry I can't do more -- is that you are creating allocations that are getting lost, using up your available memory, then overflowing the memory space. The very uncommon exception thrown when you exceed the available memory is probably unexpected and getting returned as a misalignment error. Try checking the memory usage as you run, to determine whether this might be the case.

EDIT:

My guess was wrong, and the valgrind output shows that the Misaligned address error was correct. Running that was a good idea. The clear indication is that there is a bug at a lower level than in your code, so my original idea is almost certainly correct: there is a bug that it is not in your code, but is masked.

Note that it seems there is a difference between the identity() constructor and the Matrix<,> constructor in that the former is initialized along the diagonal (slowly: better would be to eliminate the inner loop) and the latter is not. That might affect the behavior of v2m and m2v.

shipr
  • 2,809
  • 1
  • 24
  • 32
  • > you are creating allocations that are getting lost, using up your available memory, then overflowing the memory space Do you see something like that in the code that I've provided? – Max Yankov Feb 25 '15 at 22:05
  • I don't; but not all code was shown. I am editing my response now based on the input from valgrind. – shipr Feb 25 '15 at 22:45
  • I discovered the problem, and it turns out that it had nothing to do with the code I posted, and was really an obvious one. According to [this](http://meta.stackoverflow.com/questions/269605/my-question-turned-out-to-be-a-rather-esoteric-typo-what-should-i-do-what-sho), I should delete this question, but there's a lot of great discussion here. Not really sure what to do with it; can't accept this answer, because it technically isn't correct (it wasn't the issue) although you helped me greatly. – Max Yankov Mar 01 '15 at 07:39
  • @golergka You may create an answer and accept it. That's a desirable, standard behavior – edmz Mar 01 '15 at 09:07
  • @black but my personal answer (1) won't be useful to anyone else and (2) would shadow this excellent answer, which is really useful for anyone. – Max Yankov Mar 01 '15 at 09:10
  • @golergka (i) it *will* be useful: somebody might be having the same issue and end up on this question; (ii) This answer does not provide any concrete help for the solution, while yours certainly would. – edmz Mar 01 '15 at 09:16
  • @black I don't think that anyone would be having the same issue as I _actually_ had, because he would have to have the exact same code. However, a lot of people have issue that I _thought_ I had, for which this answer is much better than my actual answer. – Max Yankov Mar 01 '15 at 09:17
  • 1
    @golergka I can't obligate you, but please note that's how SE should work in this case. Please [read](https://stackoverflow.com/help/self-answer). You could still point out your notes in your answer. – edmz Mar 01 '15 at 09:22
  • 1
    @black Thanks for this discussion, I think I'll do a write-up about this issue later today. – Max Yankov Mar 01 '15 at 09:24