0

I am trying to fix this for about 5 days and no luck, every solution I tried failed.

I found some cause of the SIGSEGV below but nothing helped What is SIGSEGV run time error in C++?

Ok, here is the code. I have 2 instances, which contain some keywords-features and their scores

I want to get their eucleidian distance, which means I have to save all the keywords for each of the instances, then find the diffs for the keywords of the first one with those of the second and then find the diffs for the remaining of the second instance. What I want is while iterating the first map, to be able to delete elements from the second. The following method is called multiple times as we have two message collections, and every message from the first one is compared with every message from the second.

I have this code but it suddenly stops although I checked it is working for some seconds with multiple cout I put in some places

Note that this is for a university task so I cannot use boost and all those tricks. But I would like to know the way to bypass the problem I am into.

float KNNClassifier::distance(const Instance& inst1, const Instance& inst2) {   
map<string,unsigned> feat1;
map<string,unsigned> feat2;
for (unsigned i=0; i<inst1.getNumberOfFeatures(); i++) {
  feat1[inst1.getFeature(i)]=i;
}
for (unsigned i=0; i<inst2.getNumberOfFeatures(); i++) {
  feat2[inst2.getFeature(i)]=i;
}
float dist=0;

map<string,unsigned>::iterator it;
for (it=feat1.begin(); it!=feat1.end(); it++) {
  if (feat2.find(it->first)!=feat2.end()) {//if and only if it exists in inst2
    dist+=pow( (double) inst1.getScore(it->second) - inst2.getScore(feat2[it->first]) , 2.0);
    feat2.erase(it->first);
  }
  else {
    dist+=pow( (double) inst1.getScore(it->second) , 2.0);
  }
}

for (it=feat2.begin(); it!=feat2.end(); it++) {//for the remaining words
  dist+=pow( (double) inst2.getScore(it->second) , 2.0);
}
feat1.clear(); feat2.clear(); //ka8arizoume ta map gia thn epomenh xrhsh
return sqrt(dist);    
}

and I also tried this idea in order to not have to delete something but it suddenly stops too.

float KNNClassifier::distance(const Instance& inst1, const Instance& inst2) {
map<string,unsigned> feat1;
map<string,unsigned> feat2;
map<string,bool> exists;
for (unsigned i=0; i<inst1.getNumberOfFeatures(); i++) {
  feat1[inst1.getFeature(i)]=i;
}
for (unsigned i=0; i<inst2.getNumberOfFeatures(); i++) {
  feat2[inst2.getFeature(i)]=i;
  exists[inst2.getFeature(i)]=false;
  if (feat1.find(inst2.getFeature(i))!=feat1.end()) {
    exists[inst2.getFeature(i)]=true;
  }
}
float dist=0;
map<string,unsigned>::iterator it;
for (it=feat1.begin(); it!=feat1.end(); it++) {
  if (feat2.find(it->first)!=feat2.end()) {
    dist+=pow( (double) inst1.getScore(it->second) - inst2.getScore(feat2[it->first]) ,      2.0);
  }
  else {
    dist+=pow( (double) inst1.getScore(it->second) , 2.0);
  }
}

for (it=feat2.begin(); it!=feat2.end(); it++) {
  if(it->second==false){//if it is true, it means the diff was done in the previous iteration
    dist+=pow( (double) inst2.getScore(it->second) , 2.0);
  }
}

feat1.clear(); feat2.clear(); exists.clear();
return sqrt(dist);
}
Community
  • 1
  • 1
  • 2
    Did you run the debugger to find the line it crashes at? – RonaldBarzell Jan 02 '13 at 21:52
  • 1
    Basilis, the code you show seems to be well-formed. Running this in a debugger will be crucial if you don't want to lose more days. Instead of running `basilis_prog`, run `gdb basilis_prog`. `r` will run it until it crashes. `bt` will show you exactly where the crash happened. – Drew Dormann Jan 02 '13 at 22:15
  • I use netbeans. How exactly do I run the program with gdb? Any tricks for the debugger? –  Jan 03 '13 at 13:06
  • You may use the debugger in netbeans then. Right-click the project node and choose "debug". You can now see where it crashes and even examine what's happening as you slowly approach the crash. If you have specific debugging questions, post them here. We've started to leave the original intent of this question. You'll have to trust me that the general answer to many questions like this is "learn how to use the debugger." – Drew Dormann Jan 03 '13 at 15:31
  • Ok then, I have specific questions. I run the debugger waiting to see something familiar, but all I get is this 2 [main] spam_filter 3520 exception::handle: Exception: STATUS_ACCESS_VIOLATION 686 [main] spam_filter 3520 open_stackdumpfile: Dumping stack trace to spam_filter.exe.stackdump –  Jan 03 '13 at 22:48

1 Answers1

1

The code per se seems to be OK (the error I thought I spotted earlier wasn't one). However, there may be an easier approach:

  1. Instead of looking up the string from the first set in the second set, it would be possible to move through the two lists simultanously and advance the iterator to the smaller element or both iterators if they use the same string. The corresponding computation is done directly in each case.
  2. I would personally use two sorted std::vector<std::pair<std::string, unsigned int> > for this but std::map<std::string, unsigned int> works as well.

I don't have access to your Instance class and, thus, haven't tested it but something like the below should work.

struct compare1st {
    bool operator()(std::pair<std::string, unsigned int> const& p1,
                    std::pair<std::string, unsigned int> const& p2) const {
        return p1.first < p2.first;
    }
};

std::vector<std::pair<std::string, unsigned int> > fill(Instance const& inst) {
    std::vector<std::pair<std::string, unsigned int> > rc;
    for (unsigned int i(0), end(inst.getNumberOfFeatures()); i != end; ++i) {
        rc.push_back(std::make_pair(inst.getFeature(i), i));
    }
    std::sort(rc.begin(), rc.end(), compare1st());
    return rc;
}
double square(double d) { // pow(d, 2.0) is fairly expensive
    return d * d;
}

float KNNClassifier::distance(const Instance& inst1, const Instance& inst2) {   
    typedef std::pair<std::string, unsigned int> Pair;
    std::vector<Pair> feat1 = fill(inst1);
    std::vector<Pair> feat2 = fill(inst2);

    std::vector<Pair>::const_iterator it1(feat1.begin()), end1(feat1.end());
    std::vector<Pair>::const_iterator it2(feat2.begin()), end2(feat2.end());
    double result(0.0);
    while (it1 != end1 && it2 != end2) {
        if (it1 != end1 && (it2 == end2 || it1->first < it2->first)) {
            result += square(inst1.getScore((it1++)->second);
        }
        else if (it2 != end2 && (it1 == end1 || it2->first < it1->first))
            result += square(inst2.getScore((it2++)->second);
        }
        else {
            result += square(inst1.getScore((it1++)->second)
                             -  inst2.getScore((it2++)->second);
        }
    }
    return sqrt(result);
}
Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
  • It doesn't seem to be the problem. he called erase on value, and in the other container, not on the iterator of the same container. –  Jan 02 '13 at 22:06
  • Basilis is using erase() on a string, not an iterator. Also, map::erase(iterator) has a `void` return type. – Drew Dormann Jan 02 '13 at 22:10
  • @DrewDormann: The first point is correct but I somehow concluded that it is called on the map it is from and it wouldn't make much point to look up the object. The second point is certainly not correct for C++ 2011 although it is correct for C++ 2003. – Dietmar Kühl Jan 02 '13 at 22:15