0

If I have a file with billions of characters, what would be the fastest ways to read from it without using external libraries (asking mainly for competitive programming)?

I have found an article where it showed this example and said it was faster to use the InParser class rather than ifstream (here's the link, but it's not in english, so I don't think it helps with much: https://infogenius.ro/parsare-cpp/). It says that when you read directly with ifstream, there are a few things happening under the hood that slows things down, so instead it's better to just read the whole file as a string and then do the conversion to other datatypes (int in this example) yourself.

class InParser {
  private:
    vector<char> str;
    int ptr;
    ifstream fin;

    char getChar() {
        if (ptr == (int) str.size()) {
            fin.read(str.data(), str.size());
            ptr = 0;
        }
        return str[ptr++];
    }

    template<class T>
    T getInt() {
        char chr = getChar();
        while (!isdigit(chr) && chr != '-')
            chr = getChar();
        int sgn = +1;
        if (chr == '-') {
            sgn = -1;
            chr = getChar();
        }
        T num = 0;
        while (isdigit(chr)) {
            num = num * 10 + chr - '0';
            chr = getChar();
        }
        return sgn * num;
    }

  public:
    InParser(const char* name) : str(1e5), ptr(str.size()), fin(name) { }
    ~InParser() { fin.close(); }

    template<class T>
    friend InParser& operator>>(InParser& in, T& num) {
        num = in.getInt<T>();
        return in;
    }
};

int main() {
    InParser fin("file.in");
    int a; int64_t b; fin >> a >> b;
    cout << a + b << '\n';
    return 0;
}

My questions are:

  1. Is this approach faster than just using std::ifstream fin("file.in");?

  2. Are there even better methods to do this without using external libraries?

H-005
  • 467
  • 3
  • 15
  • 1
    1. Well, did it take more or less time to execute? – eerorika Oct 12 '20 at 14:23
  • 1
    Do you `currently` have performance issues? If so, you should measure where the bottleneck is otherwise it may be some sort of YAGNI (https://en.wikipedia.org/wiki/You_aren't_gonna_need_it) problem. – Zaiborg Oct 12 '20 at 14:24
  • @eerorika I can't really test it since my current computer takes way too long to do stuff anyway. But I am interested in any methods that are faster than the standard `std::ifstream >>` reading – H-005 Oct 12 '20 at 14:25
  • 1
    i dont understand why this should be faster. Though seeing `fin.close();` in the destructor and c style casts make the code a little suspicious – 463035818_is_not_an_ai Oct 12 '20 at 14:28
  • 1
    Does this answer your question? [Fast textfile reading in c++](https://stackoverflow.com/questions/17925051/fast-textfile-reading-in-c) – Damien Oct 12 '20 at 14:30
  • 1
    You have to profile for your bottlenecks first. From my professional experience, the actual reading of the bits from the file is oftentimes not the bottleneck, it's what you're doing with them. For example, for reading a large line-based text file I once found that a producer-consumer threading model gave a 30% performance boost. One thread read lines from a file, and another thread parsed those lines. – AndyG Oct 12 '20 at 14:32
  • @Damien It doesn't use only standard C++ though – H-005 Oct 12 '20 at 14:58
  • @idclev463035818: I suspect that the important part is the use of `std::isdigit(int)` instead of `isdigit( charT ch, const locale& loc );`. Locales aren't fast. Makes sense, because locales are for humans and humans aren't fast either. – MSalters Oct 12 '20 at 15:21
  • 2
    To answer the title of your question, the fastest methods for reading a file is to use the block read functions, e.g. `std::ifstream::read()` or `fread()`. The objective is to keep the data flowing. Read more data per transaction. For example, one read of 1024 characters is more efficient that 1024 reads of one character. Search the internet for "c++ double buffer read file". – Thomas Matthews Oct 12 '20 at 15:28

0 Answers0