3

I wrote a program that should print the last 5 lines of a file, but the teacher created a file with a line of 4 GB, and the program broke. How to rewrite a program so that it can work with very large files

a possible solution is to read the file character by character, but I don’t know how to do it

here is the c ++ program code

#include <iostream>

#include <fstream>

#include <string>

using std::ifstream;
using std::cout;
using std::string;
using std::getline;

int main(int argc, char * argv[], char * env[]) {
  setlocale(LC_ALL, "");
  int i;
  string line;

  if (argc == 3) {

    string filename = argv[1];

    ifstream myfile(filename);
    string n = argv[2];

    int nn = atoi(n.c_str());

    string line, buffer[nn];
    const size_t size = sizeof buffer / sizeof * buffer;
    size_t i = 0;

    while (getline(myfile, line)) {
      buffer[i] = line;
      if (++i >= size) {
        i = 0;
      }
    }

    for (size_t j = 0; j < size; ++j) {
      cout << buffer[i] << "\n";
      if (++i >= size) {
        i = 0;
      }
    }
    //return 0;

  }

}

2 Answers2

3

The problem must be with big lines in that 4GB file. Your solution buffers (and later drops) every line and at least one of the lines is probably too long to be buffered in the machine you're running, making your program crash.

You should read the file starting from the end counting the number of newlines and stop and output the rest of if when you reach the count of nn + 1. Buffering the last nn lines is not a good option when you need to handle big lines.

Here a snippet of a solution that could help you:

array<char, 64 * 1024> buffer; // 64kb of buffer

size_t nn = atoi(n.c_str()); 

myfile.seekg(0, ios_base::end); 

unsigned int nlcount = 0; 
size_t length = myfile.tellg(); 
size_t oldpos = length; 

while (myfile.tellg() > 0) { 
  size_t newpos = oldpos - min(oldpos, buffer.size()); 
  myfile.seekg(newpos); 
  size_t rdsize = oldpos - newpos; 
  myfile.read(buffer.data(), rdsize); 
  if (!myfile) { 
    cerr << "failed while looking for newlines\n"; 
    return 1; 
  } 
  auto rit = buffer.rbegin() + (buffer.size() - rdsize); 
  while (rit != buffer.rend() && nlcount <= nn) { 
    if (*rit == '\n') { 
      ++nlcount; 
    } 
    ++rit; 
  } 
  if (nlcount > nn) { 
    myfile.seekg(newpos + (buffer.rend() - rit) + 1); 
    break; 
  } 
  oldpos = newpos; 
} 

This will point the input stream to the exact position where you just need to output the rest of it if nlcount is equal to nn + 1. I recommend you to output it not using buffered lines, but using a fixed sized buffer:

while (myfile.peek() != EOF) {
  myfile.read(buffer.data(), buffer.size());
  cout.write(buffer.data(), myfile.gcount());
}

Don't use getline() or you will still end up buffering lines and crash when handling long ones.

  • The described problem has nothing to do with long lines so there is no problem with using `getline()` – Iman Kianrostami Nov 13 '19 at 12:31
  • Because it is not in the question. You are suppose to answer the question not your assumptions. – Iman Kianrostami Nov 13 '19 at 13:06
  • I am not saying you are wrong, I am just saying don't be sure and advise based on your assumption. Also you did not provide a source of why you are saying `getline()` will fail and what is the limit for using it. "problem reading the long lines file" is very general and should not be used as answer. – Iman Kianrostami Nov 13 '19 at 13:48
-1

To remove buffer dependency one way is to read the file from the end backward to reach the number of lines you want. 5 is hard-coded here but you can pass it as a parameter.

std::ifstream fileReader("test.txt", std::ios_base::ate );
std::string currentLine;
long length;
int lines;
char c = '\0';

if( fileReader )
{
    length = fileReader.tellg();
    for(long i = length-2; i > 0; i-- )
    {
        fileReader.seekg(i);
        c = fileReader.get();
        if( c == '\r' || c == '\n' )
        {
            lines++;
            if (lines == 5)
                break;
        }
    }

    while(fileReader)
    {
        std::getline(fileReader, currentLine);
        std::cout << currentLine << std::endl;
    }

}
Iman Kianrostami
  • 482
  • 3
  • 13