How can I improve File Reading Speed?

Question

I have several .tsv that I'm reading in my code. Each line looks like this:

3   Port 10: NDI-MF2 Channel:0  3578848 1   OK  0,4881701   0,5157377   -0,5017654  -0,4938989  195,342 -5,154  -394,990    0,9763672   0   Port 11: NDI-MF2 Channel:0  3578848 1   OK  0,1504364   0,9189614   0,2268636   -0,2853273  -93,299 -107,491    -299,260    0,9993857   0   Port 12: NDI-MF2 Channel:0  3578848 1   OK  0,0572628   0,7722947   0,5232752   -0,3556190  -107,537    -121,891    -289,059    0,6039713   0

As you can see, in each line I have data for 3 Ports. The first number is the number of Ports I have.

Then , I collect :

The Name of the Model (E.g : Port 10: NDI-MF2 Channel:0 ).
The Frame ( 3578848 )
The Face (1)
The State (OK)
R0 (0,4881701)
RX (0,5157377)
RY (-0,5017654)
RZ (-0,4938989)
TX (195,342)
TY (-5,154)
TZ (-394,990)
Error (0,9763672)
Markers (0)

(I wrote the values of the first Port, so you can understand better)

As I have multiple lines in the .tsv , I wrote this code to read it.

bool ProcesarDatos::LeerSigLineaValores(vector <TipoInformacion> *infoModelos, vector<TipoCoordenadas> *infoCoordenadas, double timestamp) {

infoModelos->clear();
infoCoordenadas->clear();


if (fichero.good()) {
    char Linea[700];
    LeerLinea(Linea, 700);

    string NTools, Model, Frame, Face, State, Rz, Ry, Rx, Tx, Ty, Tz, Error, Markers;
    string cadena(Linea);
    if (cadena.size() == 0) return false;


    istringstream divLinea(cadena);
    getline(divLinea, NTools, '\t');

    HerramientasDetectadas = stoi(NTools.c_str());
    if (HerramientasDetectadas != 3) return false;

        for (int j = 0; j < HerramientasDetectadas; j++) {
            TipoInformacion nuevoModelo;

            TipoCoordenadas nuevasCoordenadas;


            getline(divLinea, Model, '\t');
            nuevoModelo.ModelName = new char[strlen(Model.c_str()) + 1];
            strcpy(nuevoModelo.ModelName, Model.c_str());
            getline(divLinea, Frame, '\t');
            nuevoModelo.Frame = new char[strlen(Frame.c_str()) + 1];
            strcpy(nuevoModelo.Frame, Frame.c_str());
            getline(divLinea, Face, '\t');
            nuevoModelo.Face = new char[strlen(Face.c_str()) + 1];
            strcpy(nuevoModelo.Face, Face.c_str());


            getline(divLinea, State, '\t');

            nuevasCoordenadas.state = new char[strlen(State.c_str()) + 1];
            strcpy(nuevasCoordenadas.state, State.c_str());
            getline(divLinea, Rx, '\t'); //Here I lose R0 intentionally.
            getline(divLinea, Rx, '\t');
            nuevasCoordenadas.Rx = stringtoDouble(Rx);
            getline(divLinea, Ry, '\t');
            nuevasCoordenadas.Ry = stringtoDouble(Ry);
            getline(divLinea, Rz, '\t');
            nuevasCoordenadas.Rz = stringtoDouble(Rz);
            getline(divLinea, Tx, '\t');
            nuevasCoordenadas.Tx = stringtoDouble(Tx);
            getline(divLinea, Ty, '\t');
            nuevasCoordenadas.Ty = stringtoDouble(Ty);
            getline(divLinea, Tz, '\t');
            nuevasCoordenadas.Tz = stringtoDouble(Tz);
            getline(divLinea, Error, '\t');
            nuevasCoordenadas.errorValue = stringtoDouble(Error);
            getline(divLinea, Markers, '\t');
            nuevasCoordenadas.marker = stoi(Markers);

            nuevasCoordenadas.Time = timestamp;

            infoModelos->push_back(nuevoModelo);
            infoCoordenadas->push_back(nuevasCoordenadas);
        }
        return true;
    }
    else {
        cout << "\t File not good" << endl;
        return false;
    }
}

Well this code is working as I wanted, but I have 2GB of files and each .tsv has between 8000 and 25000 lines. It took me 30 minutes to execute the code and generate the excel I was looking for.

I don't need to use this code anymore, but how could I increase the speed of the method?

35000 lines = 16,500 ms Right now (16s)

You could use multiple treads or async/await calls between file reads.. Alternatively you could combine multiple files into smaller number of single files reducing number of file-reads needed, but keeping it below your ram memory size.. — estinamir, Mar 18 '19 at 18:08
To speed it up: 1) enable your compilers optimizer. 2) Read larger chunks at a time (a few MB per read or more). — Jesper Juhl, Mar 18 '19 at 18:08
Use std::string instead of `new char[n]`, since std::string will not even allocate heap memory most of the time. And strlen(State.c_str()) is much slower than State.length() — Michael Veksler, Mar 18 '19 at 18:58
2 Gigabyes will fit easily into my desktops ram. If you have the space, consider reading the entire file first into stringstream ... see https://stackoverflow.com/a/132394/2785528 and / or https://stackoverflow.com/a/138645/2785528 — 2785528, Mar 18 '19 at 19:49

How can I improve File Reading Speed?

0 Answers0