I tried to write C++ code to receive a file from files.rcsb.org programmatically. Finally succeeded in connecting, sending and receiving the data. However, The download is not happening religiously. Every 102nd line (8080 chars), a text 2000 is added in between the lines destroying the format of PDB which renders the downstream processing useless and unusable.
101 REMARK 3 ESTIMATED COORDINATE ERROR.
102 REMARK 3
103 2000
104 ESD FROM LUZZATI PLOT (A) : 0.16
This above is supposed to be:
101 REMARK 3 ESTIMATED COORDINATE ERROR.
102 REMARK 3 ESD FROM LUZZATI PLOT (A) : 0.16
More details:
- The line is read only 11 characters instead of 80 characters (getline method)
- The next distortion happens at 203 or 204th line with reading only 22 characters, incremented by 11.
- This goes on till 77 and again it retracts back.
Can some one help me fixing this logic bug.
The data is read in Binary form. The classes I created /used are available as tgz files with this link: https://drive.google.com/file/d/12kewHTt86k6u6qbtvgf4bgS6SJC_KmLj/view?usp=drive_link
Here is the code:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <algorithm>
#include "BioSocketStream.h" // Assuming this contains the BioSocketStream class
using namespace std;
bool downloadPdbFile(const string& pdbid, ostream& fout = std::cout)
{
const string hostn = "132.249.213.110";
const string doc = "/download/" + pdbid + ".pdb?fileFormat=mmcif&compression=NO";
cerr << hostn << "," << doc << endl;
BioSocketStream web;
if (!web.connect(hostn.c_str(), 80))
{
cerr << "Host not found" << endl;
return false;
}
web << "GET " << doc << " HTTP/1.0" << crlf;
web << crlf;
// Now read the response
string response;
long int filesize =0;
while (getline(web, response))
{
if (response.substr(0, 6) == "HEADER")
{
response.erase(std::remove(response.begin(), response.end(), '\r'), response.end());
// response.erase(std::remove(response.begin(), response.end(), '\n'), response.end());
filesize+=response.length();
fout << response.substr(0,79) << "\t" << filesize<<"\t"<<response.length()<<endl;
while (getline(web, response) && (response.substr(0, 3) != "END"))
{
response.erase(std::remove(response.begin(), response.end(), '\r'), response.end());
// response.erase(std::remove(response.begin(), response.end(), '\n'), response.end());
// fout << response << endl;
// filesize++;
if ( response.length() < 79)
{
cout<<response<<"\t"<<response.length()<<"\t"<<endl;
for(int i = 0; i < response.length();i++)
{
cout<<response[i]<<"***";
}
cout<<endl;
}
filesize+=response.length();
fout << response.substr(0,79) << "\t" << filesize<<"\t"<<response.length()<<endl;
response = "";
}
response = "";
}
else
{
continue;
}
}
cout<<"the filesize: "<<filesize<<endl;
return true;
}
int main()
{
string pdbid = "2apr";
ofstream f("junk.pdb",std::ofstream::binary);
if (downloadPdbFile(pdbid,f))
{
cout << "File downloaded successfully." << endl;
return 0;
}
else
{
cout << "Failed to download file." << endl;
return -1;
}
}
Thank you for your time, Looking forward for some pointers. I read the related posts but they have not helped. Any pointer will be gratefully appreciated.