I have a file containing a header and a very long string like:
>Ecoli100k
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTG
GTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGAC
....
I tried to retrieve the file size and header size using:
ifstream file(fileName.c_str(), ifstream::in | ifstream::binary);
string line1;
getline(file,line1);
int line1Size = line1.size();
file.seekg(0, ios::end);
long long fileSize = file.tellg();
file.close();
And for example for a file containing a string of length 100k with header >Ecoli100k
, fileSize
is 101261 and line1Size
is 10. now for calculating the length of the string without reading anymore:
101261 - (10+1) = 101250 that means without the header, this file contains 101250 more characters
101250/81 = 1250 that means there's 1250 full lines (but the last line has no \n) so we must subtract 1249 from 101250 to get the length of the string, but it is wrong. we get 100k+1 instead of 100k.
In code:
int remainedLineCount =
(fileSize - line1Size - 1 - 1 /*the last line has no \n*/)/81 ;
cout<<(fileSize - line1Size - 1 - remainedLineCount )<<"\n";
in another example i only add another character and because of a newline in file the size changes to 101263 and again with this calculation we will get into 100k+2 instead of 100k+1.
Anyone know where this [[ extra 1 ]] comes from? is there anything at the the end of a file?
Edit:
As requested, here is the binary value (in hexadecimal) of the bytes at begin and end of the file:
offset 0: 3e 45 63 6f 6c 69 31 30 30 6b
offset 0000018b83: 54 47 47 43 41 47 41 41 43 0a
Thanks All.