-1

I am writing a program which reads information about molecule from an input file (charge, multiplicity, atom types and their coordinates) but for some reason a 2D vector-array gets filled with zeroes and finds a non-existing atom which coordinates are also zeroes. I don't know how to debug it properly so I can't track a problem.

#include <iostream>
#include <cstdlib>
#include <string>
#include <fstream>
#include <cassert>
#include <vector>

int main(int argc, char* argv[])
{
   if (argc > 1)
   {

      std::cout << "The input file is: " << argv[1] << std::endl;

   } else
           {
             std::cout << "No arguments" << std::endl;
             return 1;
           }

   const char *filename = argv[1];
   std::ifstream inputfile(filename);
   assert(inputfile.good());
   int charge, multiplicity;
   inputfile >> charge >> multiplicity;

   std::cout << charge << " " << multiplicity << std::endl;

   int natom;

   std::vector<std::string> atomSymbol;
   std::vector< std::vector<double> > position;

   for (int i = 0; !inputfile.eof(); i++) {

   std::string bufferAtomicSymbol;
   double bufferPositionX;
   double bufferPositionY;
   double bufferPositionZ;

   std::vector<double> bufferPosition(3);

   inputfile >> bufferAtomicSymbol >> bufferPositionX >> bufferPositionY >> bufferPositionZ; 

   atomSymbol.push_back(bufferAtomicSymbol);
   bufferPosition.push_back(bufferPositionX);
   bufferPosition.push_back(bufferPositionY);
   bufferPosition.push_back(bufferPositionZ);

   position.push_back(bufferPosition);

   }

   inputfile.close();
   natom = position.size();

   std::cout << "There are " << natom << " atoms" << std::endl;

   for (int i = 0; i < natom; i++)
       std::cout << atomSymbol[i] << " " << position[i][0] << " " << position[i][1] << " " << position[i][2] << std::endl;

   return 0;
   }

Input file sample:

0 1
C          1.11988       -0.11356       -0.04893
C         -0.22149        0.53742        0.15390
N         -1.36703       -0.23693       -0.04570
O         -0.39583        1.70537        0.48392
H          1.93813        0.59458        0.13709
H          1.23188       -0.48457       -1.07645
H          1.25795       -0.96373        0.63239
H         -2.27205        0.14808        0.07622
H         -1.29145       -1.18667       -0.31244

Output of program:

The input file is: acetamide.xyz
0 1
There are 10 atoms
C 0 0 0
C 0 0 0
N 0 0 0
O 0 0 0
H 0 0 0
H 0 0 0
H 0 0 0
H 0 0 0
H 0 0 0
 0 0 0
dishu1
  • 9
  • 3
  • What is an error? – R4444 May 04 '19 at 21:45
  • 1
    `std::vector bufferPosition(3);` -- You size the vector to 3 elements. Then you do this: `bufferPosition.push_back(bufferPositionX);` -- Is this your intention, to add more elements to the vector? Whenever I see this combination of constructing a vector with `n` elements, and then later `push_back` onto the vector, it is almost always a mistake. – PaulMcKenzie May 04 '19 at 21:47
  • I have corrected the post, totally forgot about the input file and the output. My intention was to save X, Y and Z at each iteration into a vector and to save that vector in an array of vectors. Perhaps, there is a smarter way to do it, but I'd like the resulting array to be used as position[i][j] otherwise I have a lot of headache. – dishu1 May 04 '19 at 22:12
  • @dishu1 -- Do you understand the reason for my comment? How many elements will there be in that vector after one `push_back`? Do you think it still will be 3 elements? Simply put -- remove the `(3)` from the constructor, and try again. – PaulMcKenzie May 04 '19 at 22:13
  • Oh yes. I've changed the bufferPosition to be of undefined size and things got well, program easily reads and prints out coordinates found in the input file. But there is still an error of printing an excessive line in the end. That line prints coordinates of the previous atom, in other words natom = position.size() is larger than it has to be. – dishu1 May 04 '19 at 22:19
  • Unrelated: `for (int i = 0; !inputfile.eof(); i++)` is the evil twin of `while (!inputfile.eof())` [which is a very popular programming mistake](https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-i-e-while-stream-eof-cons). You are probably better off with `int i = 0; while (inputfile >> bufferAtomicSymbol >> bufferPositionX >> bufferPositionY >> bufferPositionZ) { dostuff; i++; }` – user4581301 May 04 '19 at 22:21
  • @dishu1 -- *I've changed the bufferPosition to be of undefined size* -- No, removing the `(3)` does not make the size undefined. It makes the vector *empty*, meaning 0 elements. – PaulMcKenzie May 04 '19 at 22:21

1 Answers1

1

You have two primary problems that are plaguing your attempt to successfully read from your data file, one is specific, and the other more general, but both equally fatal to the attempt.

The technical problem with your read can be solved by reviewing Why !.eof() inside a loop condition is always wrong.. The equally important more general problem you are facings is the sprawling list of vectors, strings, and doubles you are attempting to duct-tape and bailing-wire together. When you overly complicate your data handling using superfluous or ill-fitting containers, trying to make them work together is like trying to put toothpaste back into the tube -- it ain't gonna work....

Take a bit of time and sort out what it is you need to store for each chunk of data, and then create a data structure that holds that. It's fine to use a few temporary variables to facilitate reading the data, but the final storage for your data should be straight-forward and as simple as allowable.

In your case you are storing the atomic-symbol as a string and three double values for the position. You base data structure should be able to capture all three as a single object. While there are a couple of options, the plain-old struct for coordinating between the differing types of data will work fine. Then you simply create a vector of struct as your final storage solution to allow access to your data.

In keeping with the simple-as-required, you can use a struct containing a string and three double values. For example:

struct atom {
    std::string sym;
    double x, y, z;
};

That is all that is needed to capture symbol and positional coordinates. The you simply declare a vector of atom as your final storage solution, e.g.

    std::vector<atom> atoms;            /* your final storage container */

A running theme throughout your code that will invite Undefined Behavior is failing to validate EVERY input. If you simply read from a stream without validating whether the read succeeded or failed, you are simply playing Russian-Roulette. If your read fails, and you just blindly continue forward using the uninitialized variable you assume was properly filled with data, game-over.

So validate every read. For example, reading the charge and multiplicity, you can do:

    if (!(fs >> charge >> multiplicity)) {  /* validate EVERY input */
        std::cerr << "error: invalid format: charge, multiplicity\n";
        return 1;
    }

(note: I've shortened your variable names, e.g. inputfile is now fs, typing is not a strong-suit)

For reading each atom from every subsequent line in the file, you can do:

    /* read each remaining line until you run out */
    while (fs >> atmSymbol >> bposX >> bposY >> bposZ) {
        /* add the values read to your temporary struct */
        atom tmp = { atmSymbol, bposX, bposY, bposZ };
        atoms.push_back(tmp);   /* push tmp struct onto storage vector */
    }

The key is to validate the success or failure of every read so you know you are processing valid data in your code.

Putting the rest of it together in a short example to read your data file, you could do something like the following:

#include <iostream>
#include <iomanip>
#include <fstream>
#include <array>
#include <string>
#include <vector>

struct atom {
    std::string sym;
    double x, y, z;
};

int main (int argc, char *argv[]) {

    if (argc < 2) { /* validate at least one argument given for filename */
        std::cout << "error: insuffient input.\n"
                    "usage: " << argv[0] << " <fn>\n";
        return 1;
    }
    std::cout << "The input file is: " << argv[1] << '\n';

    std::ifstream fs (argv[1]);         /* file stream, just use argv[1] */
    int charge, multiplicity, natom;    /* temporary variables for filling */
    double bposX, bposY, bposZ;
    std::string atmSymbol;
    std::vector<atom> atoms;            /* your final storage container */

    if (!(fs >> charge >> multiplicity)) {  /* validate EVERY input */
        std::cerr << "error: invalid format: charge, multiplicity\n";
        return 1;
    }
    /* read each remaining line until you run out */
    while (fs >> atmSymbol >> bposX >> bposY >> bposZ) {
        /* add the values read to your temporary struct */
        atom tmp = { atmSymbol, bposX, bposY, bposZ };
        atoms.push_back(tmp);   /* push tmp struct onto storage vector */
    }
    fs.close(); /* close stream -- you are done reading */

    natom = atoms.size();   /* get an output size */
    std::cout << "\nThere are " << natom << " atoms.\n\n";

    for (auto& a : atoms) {         /* loop over each atom in vector */
        std::cout << a.sym          /* output atomic symbol */
                << " " << std::setw(8) << a.x   /* each coordinate, and */
                << " " << std::setw(8) << a.y
                << " " << std::setw(8) << a.z << '\n';/* tidy up with \n */
    }
}

Example Use/Output

$ ./bin/atoms_read dat/atoms.txt
The input file is: dat/atoms.txt

There are 9 atoms.

C   1.11988 -0.11356 -0.04893
C  -0.22149  0.53742   0.1539
N  -1.36703 -0.23693  -0.0457
O  -0.39583  1.70537  0.48392
H   1.93813  0.59458  0.13709
H   1.23188 -0.48457 -1.07645
H   1.25795 -0.96373  0.63239
H  -2.27205  0.14808  0.07622
H  -1.29145 -1.18667 -0.31244

Look things over and let me know if you have further questions.

Update Based on Request To Handle Empty-Lines in File

If you have additional blank-line separated blocks of atoms to read in your datafile, all you need to do is rearrange your read slightly to use getline to read a line at a time from the file. You then create a stringstream from the line and read from the stringstream just as we originally read from the file. If you can validly read into your atomic-symbol and positional coordinates from the stringstream, you have a valid line.

A quick edit to read with getline and removing the temporary variables that are no longer needed (we can read directly into the temporary struct now), you could do:

    std::ifstream fs (argv[1]);         /* file stream, just use argv[1] */
    int charge, multiplicity, natom;    /* temporary variables for filling */
    std::string line;
    std::vector<atom> atoms;            /* your final storage container */

    if (!(fs >> charge >> multiplicity)) {  /* validate EVERY input */
        std::cerr << "error: invalid format: charge, multiplicity\n";
        return 1;
    }
    /* read each remaining line until you run out with getline */
    while (getline (fs, line)) {
        std::stringstream ss (line);    /* create stringstream from line */
        atom tmp;                       /* declare temporary struct */
        /* read from stringstream into temporary struct */
        if (ss >> tmp.sym >> tmp.x >> tmp.y >> tmp.z)
            atoms.push_back(tmp);  /* push_back atmp struct on success */
    }
    fs.close(); /* close stream -- you are done reading */

Now, beginning with the second line, the code will read all atom data that matches your line format into your atoms vector regardless of blank or other non-conforming lines in your file.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • David, thanks a lot. How can I read more data from the input file if **** and empty strings are used as separators between data blocks? – dishu1 May 05 '19 at 09:53
  • To handle blank lines, change from reading with `while (fs >> atmSymbol >> bposX >> bposY >> bposZ)` to `std::string tmp; while (getline (fs, tmp)) { stringstream ss (tmp); if (ss >> atmSymbol >> bposX >> bposY >> bposZ) { atom atmp = { atmSymbol, bposX, bposY, bposZ }; atoms.push_back(atmp); } }`. You are just using getline to read each line and then creating a stringstream from the line read, then doing the same thing from the stringstream instead of reading directly into your 4 variable. This will read every line, but only store info from lines that are parsed successfully – David C. Rankin May 05 '19 at 17:12
  • @dishu1 - let me know if you have problems implementing it and if so, I can drop an addition to my answer. – David C. Rankin May 05 '19 at 17:22
  • Now I got my hands on it again. I abandoned this code but now got a lot of time to play around. It turned out that operations with coordinates are MUCH easier if stored in a regular 2D array. For example, if I want to calculate interatomic distances or do any BLAS operations on coordinates to determine symmetry. How can I convert this into a 2D array? At this moment my attempts are just a waste of time. – dishu1 Apr 03 '20 at 19:00
  • That will depend on whether you know the maximum number of rows you will need (and it will fit in a 2D array with automatic storage duration) or whether you will need to dynamically allocate an unknown number of rows of data. Do you want each row to have, e.g. `{ sym, x, y, z }`? Where each row is essentially a 1D array and you have some number of those? – David C. Rankin Apr 03 '20 at 20:51
  • I just got rid of the atom struct. Then my coordinates matrix looks the same as a regular 2D array and can be operated same way which is convenient. So, I managed it. – dishu1 Apr 03 '20 at 20:54
  • Yes, glad you got it working. With an array with 4-columns (as indicated above), BLAS will be much easier to use. In C/C++ you can arrange your data any way you need (learning to do it is still needed). But as you have found, going from struct to array isn't that difficult (it is the same type conversion to go from array to balanced-tree or linked-list -- or whatever. Good luck with your coding. – David C. Rankin Apr 03 '20 at 22:39