automatically compare two series -Dissimilarity test

Question

I have two series, series1 and series2. My aim is to find how much Series2 is different from Series1,on a bin to bin basis, (each bin represents a particular feature,) automatically/quantitatively. alt text This image can be seen in its original size by clicking here.

Series1 is the expected result. Series2 is the test/incoming series.

I am providing a histogram plot, where Series2 is represented in dark brown colour. You can also note in the x-axis between 221 and 353 there is a significant variation. ie Series2 is less than Series1. I am coding using C++.

I think, crosscorrelation will help, but produces a value based on similarity rather than dissimilarity. I see people talk about Kolmogorov-Smirnov Test. Is this the test which i should be performing?

UPDATE 1: I am trying to perform a template matching. I have divided my template image in to 8x8 blocks as well as my incoming test image. I am trying to compare one block in template image with the same block(based on the spatial pixel positions) in the test image. I calculate the intensity sum within each block.I obtain series1 for the Template image and have Series2 for the test image.

The solution to some questions is the same in both C and C++. This is not one of those questions, so which is it? C or C++? — Benjamin Lindley, Oct 21 '10 at 10:49
The question is imprecise. The simplest measure of dissimilarity is the average difference between samples. Without any context one cannot say whether it's good in your case. — Rafał Dowgird, Oct 21 '10 at 11:00
Agree with Rafal; there are a million different ways of measuring similarity, each with pros and cons. — Oliver Charlesworth, Oct 21 '10 at 11:07
This is not really a question about programming(yet). I see you have already asked it over on the statistical analysis page. I suggest you wait for your answer there, then if you have a question about how to implement the algorithm, come back here. In the meantime, I would close this question, as it doesn't(yet) relate to programming. — Benjamin Lindley, Oct 21 '10 at 11:32
Provide some data structures for capturing Series1 and Series2, and the concept of "bin." — Arun, Oct 21 '10 at 17:34
The way that you measure similarity is completely dependant on your goal; not on the data itself. Don't forget that. For example 2==8 if my goal is to have a number that is even. Simple example but your goal shapes you comparison. — Robert Massaioli, Oct 21 '10 at 23:41

score 1 · Accepted Answer · answered Jan 13 '11 at 23:43

Here is a C implementation of an algorithm to compute the divergence of actual data from predicted data. The algorithm comes from a book entitled Practical BASIC Programs from Osborne/McGraw-Hill copyright 1980.

Here is the .h file:

/*
 * divergence.h
 *
 *  Created on: Jan 13, 2011
 *      Author: Erik Oosterwal
 */

#ifndef DIVERGENCE_H_
#define DIVERGENCE_H_

typedef struct
{
    int DataSize;
    float TotalError;
    float AbsError;       //< Total Absolute Error
    float SqError;        //< Total Squared Error
    float MeanError;
    float MeanAbsError;
    float MeanSqError;
    float RMSError;     //< Root Mean Square Error
}DIVERGENCE_ERROR_TYPE;

void Divergence__Error(int size, float expected[], float actual[], DIVERGENCE_ERROR_TYPE *error);


// Prefer to use abs() from "stdlib.h"
#ifndef ABS
    #define ABS(x) ((x)>0) ? (x) : (0-(x))     //< Not safe!!! - Do not increment parameter inside ABS()!
#endif


#endif /* DIVERGENCE_H_ */

...the .c file:

/*
 * divergence.c
 *
 *  Created on: Jan 13, 2011
 *      Author: Erik Oosterwal
 */

#include "math.h"
#include "divergence.h"

/**
 *      @brief  Compute divergence from expected values.
 *
 *      @details    Compute the raw errors, absolute errors, root mean square errors,
 *                  etc. for a series of values.
 *
 *      @param  size - integer value defines the number of values to compare.
 */
void Divergence__Error(int size, float expected[], float actual[], DIVERGENCE_ERROR_TYPE *error)
{
    double total_err = 0.0;
    double abs_err = 0.0;
    double abs_sqr_err = 0.0;
    double temp = 0.0;
    int index = 0;

    for(index=0; index<size; index++)
    {
        temp = (double)(actual[index])-(double)(expected[index]);
        total_err+=temp;
        abs_err+=ABS(temp);
        abs_sqr_err+=pow(ABS(temp),2);
    }

    temp = (double)size;
    error->DataSize = (int)size;
    error->TotalError = (float)total_err;
    error->AbsError = (float)abs_err;
    error->SqError = (float)abs_sqr_err;
    error->MeanError = (float)(total_err/temp);
    error->MeanAbsError = (float)(abs_err/temp);
    error->MeanSqError = (float)(abs_sqr_err/temp);
    error->RMSError = (float)(sqrt(abs_sqr_err/temp));
}

...and a sample main() for testing the function:

/*
 * main.c
 *
 *  Created on: Jan 13, 2011
 *      Author: Erik Oosterwal
 */

#include <stdio.h>
#include "divergence.h"

float vote[]={40.3, 22.5, 16.3, 10.5, 7.2, 3.2};
float poll[]={42.7, 21.4, 18.2, 6.0, 7.4, 4.3};
float actual[] ={74, 70, 58, 60, 65, 73, 70};
float predict[]={49, 62, 75, 82, 37, 58, 92};

int main(int argc, char *argv[])
{
    DIVERGENCE_ERROR_TYPE stats;

    Divergence__Error(6, poll, vote, &stats);
    printf("%i\n%f\n%f\n%f\n%f\n%f\n%f\n%f\n\n\n",stats.DataSize,stats.TotalError,stats.AbsError,stats.SqError,stats.MeanError,stats.MeanAbsError,stats.MeanSqError,stats.RMSError);

    Divergence__Error(7, predict, actual, &stats);
    printf("%i\n%f\n%f\n%f\n%f\n%f\n%f\n%f\n\n\n",stats.DataSize,stats.TotalError,stats.AbsError,stats.SqError,stats.MeanError,stats.MeanAbsError,stats.MeanSqError,stats.RMSError);

    return(0);
}

I can't guarantee that this is the fastest method and the function could use some tweaking to make it more friendly to different data types, but it works and the results were verified against the samples provided in the book.

If you're interested, I found a copy of the aforementioned book online in .pdf form: ftp://ftp.worldofspectrum.org/pub/sinclair/books/PracticalBasicPrograms.pdf The Divergence program listing starts on book-page 147 (file-page 160.) There are also a few other statistics programs that may fit your needs better; if you need any help converting them to C I'd be happy to help. — oosterwal, Jan 14 '11 at 13:46

automatically compare two series -Dissimilarity test

1 Answers1

Linked