0

I have a file, which I want to process and take only some information to modify. I want, on the same run, for the sake of speed, to write the file in another output file.

I could just pick the info I wanted (one run) and then copy the file to the output file(second run). I am just doing that in one run, so that I can avoid the second one.

Below is my code. Don't get distracted by the if conditions, these are for picking the info I want. The problem is writing the to other file.

void readPoints(char* filename, std::vector<Point>& v, char* outfilename) {
  std::ifstream infile;
  std::string str;
  infile.open(filename);
  if (!infile)
    std::cout << "File not found!" << std::endl;

  std::ofstream outfile;
  outfile.open(outfilename);

  Point::FT coords[3];
  while(1) {
    infile >> str;
    outfile << str << "\t";
    if(str == "ABET")
      outfile << std::endl;
    if(str == "ATOM") {
      infile >> str;
      outfile << str << "\t";
      if(str == "16" || str == "17" || str == "18" ||
          str == "20" || str == "21" || str == "22") {
        for(int j = 0; j < 4; ++j) {
          infile >> str;
          outfile << str << "\t";
        }
        for (int j = 0; j < 3; ++j) {
          infile >> str;
          outfile << str << "\t";
          coords[j] = std::stod(str);
        }
        Point p(3, coords);
        v.push_back(p);
      }
    }
    if(str == "END")
      break;
  }
  infile.close();
  outfile.close();
}

The problem is that infile brings me words, not whitespaces, etc. So, I am using a tab to separate the words from each other. However, this is not enough, since the original file is not using tabs, but (white)spaces, I think.

Original file:

ATOM      1  HT1 ASP X   1       9.232  -9.194   6.798  1.00  1.00      ABET  
ATOM      2  HT2 ASP X   1       8.856  -7.726   7.401  1.00  1.00      ABET 
...
ATOM     50 HH11 ARG X   5       0.925  -3.001   6.677  1.00  1.00      ABET  
ATOM     51 HH12 ARG X   5       0.285  -4.616   6.734  1.00  1.00      ABET 
...
END

Output file:

ATOM    1   HT1 ASP X   1   9.232   -9.194  6.798   1.00    1.00    ABET    
ATOM    2   HT2 ASP X   1   8.856   -7.726  7.401   1.00    1.00    ABET
...
ATOM    50  HH11    ARG X   5   0.925   -3.001  6.677   1.00    1.00    ABET    
ATOM    51  HH12    ARG X   5   0.285   -4.616  6.734   1.00    1.00    ABET    
...
END

Does anyone know a way to fix this? Notice that the info are the same in both files, the distance between the words is what is bothering me!

gsamaras
  • 71,951
  • 46
  • 188
  • 305

3 Answers3

1

It appears you're trying to modify a .pdb file. This file format is very finicky in that it requires the spacing to be exact. The way to get this to work is to study the format, and mkae sure you put the right number of spaces in the right places. For example, you want the atom number to finish in the 11th place to match up with the other file, so you add 7 - str.length() whitespaces between ATOM and the first atom number (7 because the first four characters are already taken up by ATOM). Follow a similar approach for the rest of the file and you should be fine.

wolfPack88
  • 4,163
  • 4
  • 32
  • 47
  • Yes, it's a .pdb file, but I don't like the idea of counting. A buffer can do that easier.+1 for identifying the extension though! – gsamaras Aug 21 '14 at 18:40
  • @G.Samaras: As someone who worked with .pdb files everyday for the length of my Ph.D., it's pretty easy to spot. Yeah, a buffer would work better... it's just that whenever I wrote these codes, I always used pure C, so the above solution is actually what I've coded and used for ~4 years. – wolfPack88 Aug 21 '14 at 21:03
  • I can imagine why! I posted what I finally did. Thanks however for the answer. Hope you may find me answer useful! (Good luck with the Ph.D.). – gsamaras Aug 21 '14 at 21:48
1

The functions you are using to process this data format are fighting with the data format, as they are not meant to process that format of data.

Read the file line-by-line into a string and use memcmp/memcpy instead of string compares to just compare and modify things. It's fixed format. (or you could use COBOL to easily process it j/k!)

char inline[5000];
//open file
//loop thru
   // read line to string
   if (0==memcmp(inline,"ATOM",4)) ...
   // yada yada yada
   for (int j = 0; j < 3; ++j) { 
       char coord[9];  
       memcpy(coord,inline+offset+j*8,8);
       coord[8]=0;
       // do something with it...
       if (iNeedToWriteToOuptput) {
            memcpy(inline+offset+j*8,"   0.000");
    // etc...
    // write string to output

You get the idea, hope that helps.

FastAl
  • 6,194
  • 2
  • 36
  • 60
  • I got the idea, but the code you are providing is a bit unclear. However, +1 for the general idea. – gsamaras Aug 21 '14 at 18:41
  • It's pre-c++ stuff from the C libs. Kinda psuedocode and meant for example. (e.g., I don't see you modifying the output like I show zeroing out a field but it appears your logic would exclude lines). Also I didn't compile or test my syntax nor have I written C for 15 years. _but_ the functions are well documented. – FastAl Aug 21 '14 at 18:54
  • I am not hitting on you, I am just saying that the (pseudo)code is not as clear as it could be. :) – gsamaras Aug 21 '14 at 19:08
1

The answer is basically what clcto commented under the question.

I use this code to copy the files and process them.

void readPoints(char* filename, std::vector<Point>& v, char* outfilename) {

  std::ofstream outfile;
  outfile.open(outfilename);

  std::ifstream infile(filename);
  if (!infile) {
    std::cout << "File not found!" << std::endl;
    return;
  }

  std::string line;
  while (std::getline(infile, line)) {
    std::cout << line << std::endl;
    // if line of interest, process it

    // write to the other file
    outfile << line << std::endl;
  }

  infile.close();
  outfile.close();
}

And then I used this answer for the replacement.

Community
  • 1
  • 1
gsamaras
  • 71,951
  • 46
  • 188
  • 305