I'm having trouble creating a 2d array and filling it with values and then reading the array and getting totally different values. The strange thing is I have two maintain two arrays, and one of them is storing values correctly, and the other is not. I'm pretty sure I'm not overwriting elements either. I assume I'm making some stupid error that is obvious to someone who isn't horrible with C.
Please note I'm implementing the viterbi algorithm, but the general algo I understand and have a working python implementation of, it's just the arrays in c are giving me grief.
What I'm doing:
1) Malloc two arrays, they are used as 2D arrays, but I allocate a contiguous block of memory. I do not initialize the arrays explicitly as I should fill every entry in them as go through the forward step of the viterbi algo.
double *viterbi_table = malloc(sizeof(double) * n_samples * n_states);
int *best_path_table = malloc(sizeof(int) * n_samples * n_states);
2) For the forward portion of the viterbi algo, I walk through the observed data, and calculate the most likely possible states and probabilities for each state.
for (t = 1; t < n_samples; t++) // for each piece of observed data
{
for (i = 0; i < n_states; i++)
{
max_state_index = 0;
max_p = -DBL_MAX;
// calculate the max state and probability by looping through all the states
// yada yada...
// IMPORTANT PART: We set the array values here
viterbi_table[t * n_samples + i] = max_p;
best_path_table[t * n_samples + i] = max_state_index;
printf("\tbest_path_table[%d][%d] or [%d] = %d => %d\n",
i, t, t * n_samples + i, best_path_table[t * n_samples + i], max_state_index);
}
// IMPORTANT PART: print out rows of best path table to see if what we thought we inserted is in there
if (debug)
{
printf("best_path, [ ", t);
for (i = 0; i < n_states; i++)
{
printf("[%d], %d ", t * n_samples + i, best_path_table[t * n_samples + i]);
}
printf("]\n");
}
}
3) I run the code, and instead of having the array elements I set match with what I thought I set them to, I get big negative or positive numbers that look like uninitialized elements. What gives? I assigned a value to those blocks. Here's a selected portion of output that shows the problem.
t=36 => sample=X
best_path_table[0][36] or [1404] = 0 => 0
best_path_table[1][36] or [1405] = 0 => 0
best_path_table[2][36] or [1406] = 0 => 0
best_path_table[3][36] or [1407] = 0 => 0
...
best_path, [ [1404], 1399607453 [1405], -1070347604 [1406], 1399607453 [1407], 0 ... ]
By way of the contrast, the below one is correct.
t=37 => sample=X
best_path_table[0][37] or [1443] = 3 => 3
best_path_table[1][37] or [1444] = 3 => 3
best_path_table[2][37] or [1445] = 3 => 3
...
best_path, [ [1443], 3 [1444], 3 [1445], ... ]
When I run the code for a short piece of data, say < 12 observations, I don't have problems like this. When I run it for longer data, I have most of the best path table not filled correctly- it seems like the pattern is like so:
observation#
1) correct
2-3) garbage
4) correct
4-5) garbage
and so on
Code
See this gist. It has no dependencies to 3rd party libraries.
EDIT:
The first row of the viterbi table is initialized in a step prior to the forward part of the algorithm.
for (i = 0; i < n_states; i++)
{
state_i = states[i];
sample_t = samples[0];
viterbi_table[i*n_samples]
= prior(state_i, 0, true) + emission(sample_t, state_i, true);
}
EDIT2:
In a prior version of the code, I was doing the more standard 2D array initialization (in non-contiguous blocks) and corresponding array accesses. This gave me bus error
consistently for larger pieces of input data, which totally makes sense.
double **viterbi_table = malloc(sizeof * viterbi_table * n_states);
int **best_path_table = malloc(sizeof * best_path_table * n_states);
...
viterbi_table[j][t - 1] = ...
EDIT3, Comments on solution:
It turns out this was a stupid subscript mistake. The viterbi and best path arrays are n_samples * n_states in size, that is 17 * 39 = 663. That rules out any an index of 1404 as in my example.
The specific problem is that my array indexing was a mess because I mistakenly used n_samples instead of n_states. For a given observation index t (30), and a given state index i (14), the calculation looks as follows:
// Original, wrong
t * n_samples + i = 30 * 39 + 14 = 1184
// New, correct
t * n_states + i = 30 * 17 + 14 = 524
The variable t
already encodes the number of samples we are at, so we just need to multiply that by the number of states.
EDIT4, Fixed Code: Fixed code can be found here. I've also adjusted the emission and transition probabilities for my use cases.