4

I'm having trouble creating a 2d array and filling it with values and then reading the array and getting totally different values. The strange thing is I have two maintain two arrays, and one of them is storing values correctly, and the other is not. I'm pretty sure I'm not overwriting elements either. I assume I'm making some stupid error that is obvious to someone who isn't horrible with C.

Please note I'm implementing the viterbi algorithm, but the general algo I understand and have a working python implementation of, it's just the arrays in c are giving me grief.

What I'm doing:

1) Malloc two arrays, they are used as 2D arrays, but I allocate a contiguous block of memory. I do not initialize the arrays explicitly as I should fill every entry in them as go through the forward step of the viterbi algo.

double *viterbi_table = malloc(sizeof(double) * n_samples * n_states);
int *best_path_table = malloc(sizeof(int) * n_samples * n_states);

2) For the forward portion of the viterbi algo, I walk through the observed data, and calculate the most likely possible states and probabilities for each state.

for (t = 1; t < n_samples; t++) // for each piece of observed data
{
    for (i = 0; i < n_states; i++)
    {
        max_state_index = 0;
        max_p = -DBL_MAX;
        // calculate the max state and probability by looping through all the states
        // yada yada...

        // IMPORTANT PART: We set the array values here
        viterbi_table[t * n_samples + i] = max_p;
        best_path_table[t * n_samples + i] = max_state_index;
        printf("\tbest_path_table[%d][%d] or [%d] = %d => %d\n", 
            i, t, t * n_samples + i, best_path_table[t * n_samples + i], max_state_index);
    }

    // IMPORTANT PART: print out rows of best path table to see if what we thought we inserted is in there
    if (debug)
    {
        printf("best_path, [ ", t);
        for (i = 0; i < n_states; i++)
        {
            printf("[%d], %d ", t * n_samples + i, best_path_table[t * n_samples + i]);
        }
        printf("]\n");
    }
}

3) I run the code, and instead of having the array elements I set match with what I thought I set them to, I get big negative or positive numbers that look like uninitialized elements. What gives? I assigned a value to those blocks. Here's a selected portion of output that shows the problem.

t=36 => sample=X
    best_path_table[0][36] or [1404] = 0 => 0
    best_path_table[1][36] or [1405] = 0 => 0
    best_path_table[2][36] or [1406] = 0 => 0
    best_path_table[3][36] or [1407] = 0 => 0
    ...
best_path, [ [1404], 1399607453 [1405], -1070347604 [1406], 1399607453 [1407], 0 ... ]

By way of the contrast, the below one is correct.

t=37 => sample=X
    best_path_table[0][37] or [1443] = 3 => 3
    best_path_table[1][37] or [1444] = 3 => 3
    best_path_table[2][37] or [1445] = 3 => 3
    ...
best_path, [ [1443], 3 [1444], 3 [1445], ... ]

When I run the code for a short piece of data, say < 12 observations, I don't have problems like this. When I run it for longer data, I have most of the best path table not filled correctly- it seems like the pattern is like so:

observation#
1) correct
2-3) garbage
4) correct
4-5) garbage
and so on

Code

See this gist. It has no dependencies to 3rd party libraries.

EDIT:

The first row of the viterbi table is initialized in a step prior to the forward part of the algorithm.

for (i = 0; i < n_states; i++)
{
    state_i = states[i];
    sample_t = samples[0];
    viterbi_table[i*n_samples]
        = prior(state_i, 0, true) + emission(sample_t, state_i, true);
}

EDIT2:

In a prior version of the code, I was doing the more standard 2D array initialization (in non-contiguous blocks) and corresponding array accesses. This gave me bus error consistently for larger pieces of input data, which totally makes sense.

double **viterbi_table = malloc(sizeof * viterbi_table * n_states);
int **best_path_table = malloc(sizeof * best_path_table * n_states);
...
viterbi_table[j][t - 1] = ...

EDIT3, Comments on solution:

It turns out this was a stupid subscript mistake. The viterbi and best path arrays are n_samples * n_states in size, that is 17 * 39 = 663. That rules out any an index of 1404 as in my example.

The specific problem is that my array indexing was a mess because I mistakenly used n_samples instead of n_states. For a given observation index t (30), and a given state index i (14), the calculation looks as follows:

// Original, wrong
t * n_samples + i = 30 * 39 + 14 = 1184

// New, correct
t * n_states + i = 30 * 17 + 14 = 524

The variable t already encodes the number of samples we are at, so we just need to multiply that by the number of states.

EDIT4, Fixed Code: Fixed code can be found here. I've also adjusted the emission and transition probabilities for my use cases.

nflacco
  • 4,972
  • 8
  • 45
  • 78
  • 1
    If anything, **this** question would deserve the upvotes of [this one](http://stackoverflow.com/questions/17992553/concept-behind-these-4-lines-of-tricky-c-code#) regarding the effort put into the research for a solution. +1. –  Aug 02 '13 at 21:20
  • `(t = 1; t < n_samples; t++)` this is probably wrong, if you have n_samples, the loop should start from 0. – Karoly Horvath Aug 02 '13 at 21:20
  • @KarolyHorvath `t = 1` is correct (from what I understand about the algorithm). – Drew McGowen Aug 02 '13 at 21:20
  • `max_state_index` is always 0. – Karoly Horvath Aug 02 '13 at 21:22
  • Even if t = 1 is right, you still have garbage values in the array. They need to be cleared. – Jiminion Aug 02 '13 at 21:28
  • The first row is initialized in step previous to the forward portion of the algorithm. – nflacco Aug 02 '13 at 21:51
  • There a quite a lot of reasons why the 'Edit 2' code caused you problems; there simply wasn't enough initialization going on unless you omitted the extra allocations and loops that are necessary. However, that's tangential to your actual current problem. You should compile with `gcc -Wall -Wextra` at least. The problems it diagnoses are mostly not critical, but it's a good habit to compile with them anyway and have no warnings in your code. – Jonathan Leffler Aug 03 '13 at 05:56
  • The edit 2 code had the initialization code, I omitted it to save space. I do need to use -Wall and -Wextra though. – nflacco Aug 03 '13 at 17:51

2 Answers2

1

I added a macro CHECK_SUBSCRIPT to check that subscripts are in range, using assert. It fired — there's a subscript out of control.

Source

Comments removed.

#include <assert.h>
#include <float.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define CHECK_SUBSCRIPT(x, n_samples)   assert((x) >= 0 && (x) < (n_samples) * n_states)

typedef enum { false, true } bool;
typedef double (*prob_function_def)(char, char, bool);

int n_states = 17;
int n_observations = 17;
char states[17] =
    { 'X', '0', '1', '2', '3', '4', '5', '6', '7',
    '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
char observations[17] =
    { 'X', '0', '1', '2', '3', '4', '5', '6', '7',
    '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};

void viterbi_log (
                char *samples,
                int n_samples,
                char *best_path,
                prob_function_def prior,
                prob_function_def transition,
                prob_function_def emission,
                bool debug
             )
{
    printf("\nviterbi...\n");

    int i, j, t, max_state_index;
    char state_i, state_j, sample_t;
    double trans_p, max_p;

    double *viterbi_table = malloc(sizeof(double) * n_samples * n_states);
    int *best_path_table = malloc(sizeof(int) * n_samples * n_states);

    for (int n33 = 0; n33 < n_samples * n_states; n33++)
    {
        CHECK_SUBSCRIPT(n33, n_samples);
        viterbi_table[n33] = 3.14159;
        best_path_table[n33] = 314159;
    }

    if (debug) printf("\nInitialization:\n");
    for (i = 0; i < n_states; i++)
    {
        state_i = states[i];
        sample_t = samples[0];
        CHECK_SUBSCRIPT(i*n_samples, n_samples);
        viterbi_table[i*n_samples]
            = prior(state_i, 0, true) + emission(sample_t, state_i, true);
        if (debug)
        {
            printf("\t");
            printf("log(prior[%c]) + log(emission[%c][%c]) = %e\n",
                state_i, sample_t, state_i, viterbi_table[i*n_samples]);
        }
    }

    if (debug) printf("\nForward:\n");
    for (t = 1; t < n_samples; t++)
    {
        sample_t = samples[t];
        if (debug) printf("t=%d => sample=%c\n", t, sample_t);
        for (i = 0; i < n_states; i++)
        {
            state_i = states[i];
            max_state_index = 0;
            max_p = -DBL_MAX;

            for (j = 0; j < n_states; j++)
            {
                state_j = states[j];
                CHECK_SUBSCRIPT(((t-1)*n_samples)+j, n_samples);
                trans_p = viterbi_table[((t - 1) * n_samples) + j]
                    + transition(state_i, state_j, true)
                    + emission(sample_t, state_j, true);
                if (trans_p > max_p)
                {
                    max_state_index = j;
                    max_p = trans_p;
                }
            }

            CHECK_SUBSCRIPT(t*n_samples+i, n_samples);
            viterbi_table[t * n_samples + i] = max_p;
            best_path_table[t * n_samples + i] = max_state_index;
            printf("\tbest_path_table[%d][%d] or [%d] = %d => %d\n",
                    i, t, t * n_samples + i, best_path_table[t * n_samples + i], max_state_index);
        }

        if (debug)
        {
            printf("best_path, [ ");
            for (i = 0; i < n_states; i++)
            {
                CHECK_SUBSCRIPT(t*n_samples+i, n_samples);
                printf("[%d], %d ", t * n_samples + i, best_path_table[t * n_samples + i]);
            }
            printf("]\n");
        }
    }

    if (debug)
    {
        printf("\nbest path table:\n");
        for (t = n_samples - 1; t > 0; t--)
        {
            printf("t=%d, [ ", t);
            for (i = 0; i < n_states; i++)
            {
                CHECK_SUBSCRIPT(t*n_samples+i, n_samples);
                printf("[%d], %d ", t * n_samples + i, best_path_table[t * n_samples + i]);
            }
            printf("]\n");
        }
    }

    free(viterbi_table);
    free(best_path_table);
}

double prior_prob (char state_i, char state_j, bool log_prob)
{
    if (!log_prob)
        return 0.25;
    else
        return -1.3862943611198906;
}

double transition_prob (char state_i, char state_j, bool log_prob)
{
    if (!log_prob)
    {
        if (state_i == 0 && state_j == 0)
            return 0.9;
        else if (state_i == 0 || state_j == 0)
            return 0.1;
        else if (state_i == state_j)
            return 0.9;
        else
            return 0.1;
    }
    else
    {
        if (state_i == 0 && state_j == 0)
            return -0.10536051565782628;
        else if (state_i == 0 || state_j == 0)
            return -2.3025850929940455;
        else if (state_i == state_j)
            return -0.10536051565782628;
        else
            return -2.3025850929940455;
    }
}

double emission_prob (char observation, char state, bool log_prob)
{
    if (!log_prob)
    {
        if (state == observation)
            return 0.8;
        else
            return 0.2;
    }
    else
    {
        if (state == observation)
            return -0.2231435513142097;
        else
            return -1.6094379124341003;
    }
}

void dtmf_example()
{
    bool debug = true;
    char *samples = "XXXXXXXXXXXXXX6X66XXXXXXXXXXXXXXXXXXXXX";
    int n_samples = strlen(samples);
    char *best_path = malloc(sizeof(int) * n_samples);

    viterbi_log(samples, n_samples, best_path,
        &prior_prob, &transition_prob, &emission_prob, debug);

    printf("\nbest path = { ");
    for (int t = 0; t < n_samples; t++)
        printf("%d ", best_path[t]);
    printf("}\n");
}

int main(void)
{
    dtmf_example();
    return 0;
}

Assertion fired:

best_path_table[13][16] or [637] = 0 => 0
best_path_table[14][16] or [638] = 0 => 0
best_path_table[15][16] or [639] = 0 => 0
best_path_table[16][16] or [640] = 0 => 0
best_path, [ [624], 0 [625], 0 [626], 0 [627], 0 [628], 0 [629], 0 [630], 0 [631], 7 [632], 0 [633], 0 [634], 0 [635], 0 [636], 0 [637], 0 [638], 0 [639], 0 [640], 0 ]
t=17 => sample=6
Assertion failed: ((t*n_samples+i) >= 0 && (t*n_samples+i) < (n_samples) * n_states), function viterbi_log, file viterbi3.c, line 89.
Abort trap: 6

That's the CHECK_SUBSCRIPT at:

        CHECK_SUBSCRIPT(t*n_samples+i, n_samples);
        viterbi_table[t * n_samples + i] = max_p;
        best_path_table[t * n_samples + i] = max_state_index;
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • It seems it is accessing unallocated memory, hence the undefined behavior. If you print the value multiple times of the same element for that out of bound access, for every access it gives different values. – Uchia Itachi Aug 03 '13 at 06:40
  • If you access values that are out of bounds, you get whatever you get. That's the beauty of invoking undefined behaviour — the compiler is never wrong if you do it. It looks like `t == 17` but it should never be more than 16; I've not verified that. – Jonathan Leffler Aug 03 '13 at 06:46
  • Definitely a subscript problem, I didn't do the basic common sense check of `n_states * n_samples = 39 * 17 = 663`, when I'm getting indices like 1404. – nflacco Aug 03 '13 at 17:43
0

It would appear that because t starts with 1, the first row is never initialized. I'm guessing that each row depends on the previous, so the junk values propagate through. The fact that one particular case worked is purely by chance (the uninitialized data happened to be valid).

Drew McGowen
  • 11,471
  • 1
  • 31
  • 57
  • Even if the first row of the best path table is not initialized 1) the code works fine with shorter pieces of data 2) the best path array values depend only on the viterbi table, which is initialized. – nflacco Aug 02 '13 at 21:52