-2

I tried to write C++ code to receive a file from files.rcsb.org programmatically. Finally succeeded in connecting, sending and receiving the data. However, The download is not happening religiously. Every 102nd line (8080 chars), a text 2000 is added in between the lines destroying the format of PDB which renders the downstream processing useless and unusable.

101 REMARK 3 ESTIMATED COORDINATE ERROR.
102 REMARK 3
103 2000
104 ESD FROM LUZZATI PLOT (A) : 0.16

This above is supposed to be:

101 REMARK 3 ESTIMATED COORDINATE ERROR.
102 REMARK 3 ESD FROM LUZZATI PLOT (A) : 0.16

More details:

  1. The line is read only 11 characters instead of 80 characters (getline method)
  2. The next distortion happens at 203 or 204th line with reading only 22 characters, incremented by 11.
  3. This goes on till 77 and again it retracts back.

Can some one help me fixing this logic bug.

The data is read in Binary form. The classes I created /used are available as tgz files with this link: https://drive.google.com/file/d/12kewHTt86k6u6qbtvgf4bgS6SJC_KmLj/view?usp=drive_link

Here is the code:

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <algorithm>
#include "BioSocketStream.h" // Assuming this contains the BioSocketStream class

using namespace std;

bool downloadPdbFile(const string& pdbid, ostream& fout = std::cout)
{
    const string hostn = "132.249.213.110";
    const string doc = "/download/" + pdbid + ".pdb?fileFormat=mmcif&compression=NO";

    cerr << hostn << "," << doc << endl;

    BioSocketStream web;
    if (!web.connect(hostn.c_str(), 80))
    {
        cerr << "Host not found" << endl;
        return false;
    }

    web << "GET " << doc << " HTTP/1.0" << crlf;
    web << crlf;

   // Now read the response

    string response;
    long int filesize =0;

    while (getline(web, response))
    {
        if (response.substr(0, 6) == "HEADER")
        {
                response.erase(std::remove(response.begin(), response.end(), '\r'), response.end());
//              response.erase(std::remove(response.begin(), response.end(), '\n'), response.end());
            filesize+=response.length();
            fout << response.substr(0,79) << "\t" << filesize<<"\t"<<response.length()<<endl;
            while (getline(web, response) && (response.substr(0, 3) != "END"))
            {
                response.erase(std::remove(response.begin(), response.end(), '\r'), response.end());
//              response.erase(std::remove(response.begin(), response.end(), '\n'), response.end());
            //    fout << response << endl;
        //      filesize++;

                if ( response.length() < 79)
                {
                        cout<<response<<"\t"<<response.length()<<"\t"<<endl;
                        for(int i = 0; i < response.length();i++)
                        {
                                cout<<response[i]<<"***";
                        }
                        cout<<endl;
                }
            filesize+=response.length();
            fout << response.substr(0,79) << "\t" << filesize<<"\t"<<response.length()<<endl;
            response = "";
            }
            response = "";
        }
        else
        {
            continue;
        }

    }
    cout<<"the filesize: "<<filesize<<endl;


    return true;
}

int main()
{
    string pdbid = "2apr";
    ofstream f("junk.pdb",std::ofstream::binary);
    if (downloadPdbFile(pdbid,f))
    {
        cout << "File downloaded successfully." << endl;
        return 0;
    }
    else
    {
        cout << "Failed to download file." << endl;
        return -1;
    }
}

Thank you for your time, Looking forward for some pointers. I read the related posts but they have not helped. Any pointer will be gratefully appreciated.

Botje
  • 26,269
  • 3
  • 31
  • 41
Prasad
  • 1,837
  • 2
  • 12
  • 7
  • Please format this illegible mess properly. – user207421 Aug 07 '23 at 08:09
  • Rolling your own HTTP client in 2023 seems unwise. – Botje Aug 07 '23 at 08:10
  • Or even in 1993 ;-) @Botje – user207421 Aug 07 '23 at 08:16
  • 1
    NB Can we assume that by 'religiously' you mean 'reliably'? – user207421 Aug 07 '23 at 08:20
  • Here's a hint. You're function is simultaneously trying to download the content, alter the content line by line with a series of fixups, and some mix of writing to cout and fout. The bug could be anywhere, and there could be multiple bugs. Why don't you have one function purely to download the contents to memory (string). Then another function to parse this response string and output the amended version as another string. Then another function to write the contents to disk. That way, you can isolate where the bug is coming from and have unit tests to validate each. – selbie Aug 07 '23 at 08:25
  • @Botje Well done but the OP should have been left to do that for himself. Spoonfeeding doesn't help him in the long run. – user207421 Aug 07 '23 at 08:26
  • The file in question seems to be some sort of bioinformatics file. Surely there is a library you can use to process the format and extract things in a much nicer way? – Botje Aug 07 '23 at 08:35
  • Can you explain why you write the position in the file so far *and* the length of the line to `fout` everywhere? From your question it seems like you just want the contents of the file... – Botje Aug 07 '23 at 08:41
  • @botje, thanks for the time. I was trying to know the lines and positions where the format was getting distorted. cout and fout were giving the extraneous pattern i.e. 2000. – Prasad Aug 07 '23 at 10:22
  • @selbie: I tried the same but the *2000* pattern is still a mystery for me. – Prasad Aug 07 '23 at 10:25
  • @Botje yes its a Protein Data bank formatted file. Regarding my own client - I am attempting to keep a slim version. libcurl has been my favorite but that too is now bloated hence trying minimalist code. and OOPs for my own satisfaction. – Prasad Aug 07 '23 at 10:29
  • 1
    See [What is a debugger and how can it help me diagnose problems?](https://stackoverflow.com/questions/25385173/what-is-a-debugger-and-how-can-it-help-me-diagnose-problems) and [How to debug small programs](https://ericlippert.com/2014/03/05/how-to-debug-small-programs/) – Jesper Juhl Aug 07 '23 at 10:56
  • @JesperJuhl - Thank you. I am currently doing the same. Regards. – Prasad Aug 08 '23 at 04:10

0 Answers0