1

I am working on a subgraph matching problem (match chemical functional groups within molecules). The original code was written by another student (under Visual C++, no MS specific libraries) and it ran fine on Windows. I then added new functions to the program but did not alter the algorithm for the subgraph matching, and the new program compiled fine under gcc4.2/Mac OS X. However I am running into strange problems during runtime!

The objects & their members that are relevant here:

  1. Atom: contains ID, Element, list of Bonds (vector of pointers to Bond objects), search_mark (bool). Functions to get the variables and to set search_mark as true or false.

  2. Bond: contains an array of 2 pointers to atoms A and B , and a function that returns atom* A when called with parameter atom* B, vice versa.

  3. Molecule: contains vector of pointers to atoms, and function to get an atom* using either the atom ID or the position within the vector.

  4. A subclass of Atom: HammettAtom. The extra member it contains is an atom pointer to a related molecule atom.

This is the algorithm of the recursion function: For every atom in an array A, compare to an atom in array B(the Hammett group, typically about 10-20 atoms in size). If the element is the same, obtain the list of connected atoms of each, and repeat. The tested atoms are marked along the way, so at one point there will be no more unmarked connected atoms.

Here is the code (unaltered, only the cout bits are added by me for testing). When the function is first called, the first vector is a single atom from a test molecule, the second vector is the 2nd atom of a Hammett group molecule. (The 1st atom in Hammett has ID 'X' and can be anything.)

bool HammettCheck::checkSubproblem(vector<Atom*> bonded_atoms, vector<Atom*> my_list) 
{
unsigned int truth=0;
vector<Atom*> unmarked_bonded;
vector<Atom*> unmarked_list;
cout << "\n size of Hammett array: " <<my_list.size()<< " size of mol array: "<< bonded_atoms.size() << endl; //for testing
//If number of connected atoms is different, return false.
if( bonded_atoms.size() != my_list.size() ){
    return false;
}

//Create new lists.
for(unsigned int i=0; i < bonded_atoms.size() ; i++){

    //Create list of unmarked connected atoms in molecule.
    if( !bonded_atoms[i]->isMarked() ){
        unmarked_bonded.push_back(bonded_atoms[i]);
    }

    //Create list of unmarked connected atoms in hammett.
    if( !my_list[i]->isMarked() ){
        unmarked_list.push_back( my_list[i] );
    }
}
cout << "size of unmarked Hammett array: " << unmarked_list.size() << " size of unmarked mol array: "<< unmarked_bonded.size() <<endl; //for testing
//If number of unmarked connected atoms is different, return false.
if( unmarked_bonded.size() != unmarked_list.size() ){
    return false;
}


//Check each unmarked atom connected in the molecule against possible atoms it could be in the hammett group.
for(unsigned int i=0; i < unmarked_bonded.size(); i++){
  cout<< "atom in um_mol array considered ID: " << unmarked_bonded[i]->getID() << " Ele: " << unmarked_bonded[i]->getEle()<< endl;
    /*Unmarked hammett assigned in reverse order so that the undefined "X" atom is only 
      assigned if a connected atom can not possibly be any other atom.*/
    for(int j=(unmarked_list.size()-1); j > -1; j--){
      cout << "atom in um_h_array considered ID: " << unmarked_list[j]->getID() << endl;
        //If hammett atom has already been assigned to a connected atom, it cannot be assigned to another
        if(!unmarked_list[j]->isMarked()){
          cout << unmarked_list[j]->getID() << "is unmarked" <<endl;
            /*If connected atom could only be hammett group's connection 
              to the rest of the molecule, assign it as such.*/
            if( !strcmp(unmarked_list[j]->getEle().c_str(), "X") ){
                unmarked_bonded[i]->mark();
                unmarked_list[j]->mark(unmarked_bonded[i]);
                truth++;
                cout<< "mol atom ID "<< unmarked_bonded[i]->getID() <<" marked as X, current truth: "<< truth << endl;
                cout << unmarked_list[j]->getID() << "is now marked(1)/unmarked(0) " << unmarked_list[j]->isMarked() << " and break loop "<<endl;
                break;
            }

            /*If connected atom is the same element as a possible hammett atom,
              check that atoms connections by running them through the subproblem.*/
            if( !strcmp(unmarked_bonded[i]->getEle().c_str(), unmarked_list[j]->getEle().c_str()) ){
                unmarked_bonded[i]->mark();
                unmarked_list[j]->mark(unmarked_bonded[i]);
                cout<<"found same ele between mol_id "<< unmarked_bonded[i]->getID() <<" and ham_id " << unmarked_list[j]->getID() <<endl;
                vector<Atom*> new_bonded = getAttachedAtoms( unmarked_bonded[i] );
                vector<Atom*> new_list = getAttachedAtoms( unmarked_list[j] );
                if( checkSubproblem( new_bonded, new_list ) ){
                  cout<<"found same atom"<<endl;
                    truth++;
                    break;

                /*If only the elements are the same do not assign 
                  the hammett atom to this connected atom.*/
                }else{
                    unmarked_bonded[i]->demark();
                    unmarked_list[j]->demark();
                }
            }
        }
    }
}

//Return true if all connected atoms can be assigned atoms of the hammett group.
if( truth == unmarked_bonded.size() ){
    return true;
}else{
    return false;
}
}

I run the compiled program with a test molecule of 29 atoms, and compared it to two Hammett groups. It should contain group 2 but not group 1. However it has returned true WHENEVER I started with 2 atoms that had the same element. Below is a sample of the output (the molecule does not actually contain the Hammett group at that atom)

 currently at molecule atom ID 1

 size of Hammett array: 1 size of mol array: 1
 size of unmarked H array: 1 size of unmarked mol array: 1
 atom in um_mol array considered ID: 1 Ele: N
 atom in um_h_array considered ID: N1
 N1is unmarked
 found same ele between mol_id 1 and ham_id N1

 size of Hammett array: 3 size of mol array: 3
 size of unmarked H array: 3 size of unmarked mol array: 3
 atom in um_mol array considered ID: 2 Ele: H
 atom in um_h_array considered ID: O2
 O2is unmarked
 atom in um_h_array considered ID: O1
 O1is unmarked
 atom in um_h_array considered ID: X
 X is unmarked
 mol atom ID 2 marked as X, current truth: 1
 X is now marked(1)/unmarked(0) 128 and break loop 
 atom in um_mol array considered ID: 8 Ele: C
 atom in um_h_array considered ID: O2
 O2is unmarked
 atom in um_h_array considered ID: O1
 O1is unmarked
 atom in um_h_array considered ID: X
 X is unmarked
 mol atom ID 8 marked as X, current truth: 2
 X is now marked(1)/unmarked(0) 160 and break loop 
 atom in um_mol array considered ID: 17 Ele: C
 atom in um_h_array considered ID: O2
 O2is unmarked
 atom in um_h_array considered ID: O1
 O1is unmarked
 atom in um_h_array considered ID: X
 X is unmarked
 mol atom ID 17 marked as X, current truth: 3
 X is now marked(1)/unmarked(0) 128 and break loop 
 found same atom
 Hammet group 2 checkSubproblem true
 Hammett added to atom 1

Sorry if that was long. But the thing was that right after I have marked the 'X' atom (1st atom in Hammett molecule), and tried to get the search_mark boolean, it had a value greater than 1. Therefore X was incorrectly 'marked' several times and the 'truth' counter went up until the condition truth == unmarked_bonded.size() was reached.

I am not sure what the actual problem is? The value 128 suggest some mixed up memory/pointer issue, but am unsure how to find that out. I am not even sure if it has anything to do with the recursion function!

I would be very grateful if anyone could suggest something that I can try. Thanks in advance!

P.S. the code for the Atom class functions.

 string Atom::getID()
{
return id;
}

string Atom::getEle()
{
return ele;
}
void Atom::mark()
{
search_mark = true;
}

void Atom::demark()
{
search_mark = false;
}
void HammettAtom::mark(Atom* assigned)
{
search_mark = true;
related_mol_atom = assigned;
}
bool Atom::isMarked()
{
return search_mark;
}
  • I have done some more testing and the 'search_mark' booleans for the atom X and the first molecular atom are definitely zero before the function is first called. – disillusioned May 11 '12 at 01:33
  • Have you tried running it under Valgrind? – Adam Rosenfield May 11 '12 at 01:41
  • Atom::isMarked() is returning a bool by value. For some reason it isn't printing as a 0 or 1. What happens if you try storing it as a local variable (i.e. make the call before the cout statement)? – tmpearce May 11 '12 at 01:45
  • @tmpearce tried that and it was the same result. seems to be a subclass problem – disillusioned May 11 '12 at 01:53
  • @AdamRosenfield No, I just debugged under gdb. Also it seems to be an issue of the HammettAtom subclass, as the molecular atom (which is not in that subclass) seemed to have been marked fine. I have not used Valgrind before, do you mean the Memcheck tool? I am installing it atm – disillusioned May 11 '12 at 01:57

1 Answers1

1

Thanks for all the suggestions. After running the program under Valgrind, it is clear that the problem was related to a memory leak. I moved the position of the allocator in the 'deinitely lost' problems identified, and the search_mark problem seems to be removed. The program now runs as expected but with memory leaks. I have not managed to fix the memory leak problem though, and I have posted a new question here.

Community
  • 1
  • 1