Read and extract long int data among strings in a fixed column file using file positions?

Question

I'm trying to extract the long int data only from /proc/meminfo. A sample of the file is below. I can do this, but a long known mantra in Linux is, do one thing, and do it well. It irks me that I'm not doing this as efficiently as I could be.

I don't want to be reading/collecting/storing the char data before and after the long ints. I don't want the program to check for characters to make sure they are or are not such and such. I only want the long int data to be processed and stored as a variable, and I want it to be done via file positions which simply skip all the useless char data.

Also, I want the data to be grabbed as a long int and stored as a long int. There are several programs doing all that I've said, but they start out storing the information as a character string. That string would have to be converted back, negating much of the benefit.

The way I'd like to do this is by moving the file position to right in front of the long int, and then storing it, but I haven't figured out how to do it efficiently. The only way I could get that to work was to always start from the beginning of the file and use bigger and bigger file positions to get to each successive long int. The result was very slow execution—much slower than my code below (~30% slower). Perhaps the program had to restart from the beginning and go through the entire file to find its position?

I want to jump to 766744, grab it, store it, and then jump (starting from this new current position) to 191680, grab it, store it, and jump to 468276... You get the idea.

The file position jumps (aside from the very first jump to 766744, which is 18 characters), starting from the end of the long int, going past the 'kB', down a line, and ending at the next number, are always 22 characters.

/proc/meminfo:

MemTotal:         766744 kB
MemFree:          191680 kB
MemAvailable:     468276 kB
Buffers:           30180 kB
Cached:           272476 kB

Here are two of my best attempts to do this. They work fine, but they are not as efficient as they can be; they scan and check for particular data, and they waste resources doing so:

mem.cpp:

/*

Compile using:
g++ -Wall -O2 mem.cpp -o mem

*/

#include<fstream>
#include<unistd.h>   // Needed for usleep func.
#include<limits>

int mem()
{

    unsigned int memTotal, memFree, buffers, cached;

    std::ifstream file("/proc/meminfo");

    file.ignore(18, ' '); 
    file >> memTotal;
    file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');

    file.ignore(18, ' '); 
    file >> memFree;
    file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');

    // Skip 'MemAvailable:' line:
    file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');

    file.ignore(18, ' '); 
    file >> buffers;
    file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');

    file.ignore(18, ' '); 
    file >> cached;

    file.close();

    return ((memTotal - memFree) - (buffers + cached)) / 1024;
}

int main()
{

    do{

        // Everyday use:
        printf("mem: %im\n", mem());
        sleep(1);

        // For benchmarking:
        // mem();
        // usleep(55);

    }while(1);

    return 0;

}

Compile using: [code]g++ -Wall -O2 mem.cpp -o mem[/code]

mem.c

/*

Compile using:
g++ -Wall -O2 mem.c -o mem

*/

#include<fstream>
#include<unistd.h>   // Needed for 'usleep' func.

unsigned int mem()
{

    unsigned int memTotal, memFree, buffers, cached;

    FILE * const file = fopen( "/proc/meminfo", "r" );

    fscanf(file, "%*19s %i %*2s %*19s %i %*2s %*19s %*i %*2s %*19s %i %*2s %*19s %i", &memTotal, &memFree, &buffers, &cached);

    fclose(file);

    return ((memTotal - memFree) - (buffers + cached)) / 1024;
}

int main()
{

    do{

        printf("mem: %im\n", mem());
        sleep(1);

        //For benchmarking:
        //mem();
        //usleep(55);

    }while(1);

    return 0;
}

Compile using: g++ -Wall -O2 mem.c -o mem

* EDIT *

In trying to recreate my code that I originally had using file positions, I did get it to work as asked, but the code is actually SLOWER (by 2%) than both codes above:

mem3.c

/*

  // -O3 seems to be .6% more efficient
  g++ -Wall -O3 mem3.c -o mem3 


  cpu 47.7% @ 55 microseconds

*/

#include<fstream>
#include<unistd.h>   // Needed for usleep func.

int mem()
{
    unsigned long memTotal, memFree, buffers, cached;

    FILE * file;
    file = fopen("/proc/meminfo", "r");

    fseek(file, 18, SEEK_SET);
    fscanf(file, "%lu", &memTotal);

    fseek(file, 22, SEEK_CUR);
    fscanf(file, "%lu", &memFree);

    fseek(file, 40, SEEK_CUR);
    fscanf(file, "%lu", &buffers);

    fseek(file, 22, SEEK_CUR);
    fscanf(file, "%lu", &cached);

    fclose (file);
    return ((memTotal - memFree) - (buffers + cached)) / 1024;

}

int main()
{

    do{

        printf("mem: %im\n", mem());
        sleep(1);

//      For testing:

//      mem();
//      usleep(55);

    }while(1);

    return 0;
}

No storing useless data. No extra checking, but the code above is slower?? Clearly I'm not doing something right, and I'm somehow causing an increased workload.

Thanks for reading. Looking for suggestions.

* EDIT 2 *

I was able to get a decent efficiency gain of 7% through the use of making a custom function. Notes are in code.

mem.c:

/*

  // -O3 seems to give .6% increased efficiency
  g++ -Wall -O3 mem.c -o mem

  43.7% CPU usage @ usleep(55)

*/

#include<fstream>
#include<unistd.h>   // Needed for 'usleep' func.

/* Function courtesy of: https://stackoverflow.com/questions/16826422/c-most-efficient-way-to-convert-string-to-int-faster-than-atoi. With a personal modification to allow for conversion of an int between strings. */

void naive(const char *p, unsigned int &x)
{
    x = 0;
    do{

        // Nifty little trick with uint8_t... which I saw on stack! :D
        if (uint8_t(*p - '0') < 10)
            x = (x*10) + (*p - '0');

    }while(*++p != '\0');
}

unsigned int mem()
{
    unsigned int memTotal, memFree, buffers, cached;

    FILE * file;
    char str [30]; // Length of each file line

    file = fopen ("/proc/meminfo" , "r");

    /* Looking into finding a way to gather all the below info at once; likely, the 5 'fget' file calls are slowing things down. */

    fgets(str, 30, file);
    naive(str, memTotal);

    fgets(str, 30, file);
    naive(str, memFree);

    fgets(str, 30, file);

    fgets(str, 30, file);
    naive(str, buffers);

    fgets(str, 30 , file);
    naive(str, cached);

    fclose(file);

    return ((memTotal - memFree) - (buffers + cached)) / 1024;
}

int main()
{

    do{
        // Everyday usage:
        //printf("mem: %im\n", mem());
        //sleep(1);

        // For testing:
        mem();
        usleep(55);

    }while(1);

    return 0;
}

It is possible that you can save a few milliseconds by taking the approach you've outlined in your question. However, I highly doubt that the amount of time saved over the entire lifetime of you using this program will exceed the amount of time it will take you to implement it. Unless you expect to be running your program several million times, it is unlikely that any performance difference will be measurable. — Sam Varshavchik, Mar 17 '19 at 15:09
And if performance is truly that important to you, the last thing you will be doing is using `std::ifstream`, which does not exactly have a reputation for efficiency. Rather, you'll be using POSIX `open()`, `lseek()`, and `read()` functions, and then parsing the integer values yourself instead of using the C++ library to do so, which also burns an extraordinary amount of electrons by taking into consideration things like the current locale, which you do not really care about. Actually, forget about `lseek()`. A single `read()` to swallow everything, then parse it in memory. — Sam Varshavchik, Mar 17 '19 at 15:12
It *Does not matter*. Just parse it in the easiest, most readable, manner possible. As @Sam said, any nanoseconds you save here are going to be irrelevant in real life, but you'll spend hours/days saving them. Don't bother. — Jesper Juhl, Mar 17 '19 at 15:41
Sam Varshavchik and Jesper Juhl: I agree that I'm looking for changes that are quite insignificant, but that's not the point. The point is to make it as efficient as possible, so that I can use this type of code, and use this knowledge, in other places of coding where it really will make a difference. As for POSIX open(), lseek(), and read(), I will check them out... Using C or C++ doesn't matter to me. If C solutions are more efficient, I'll use them. Again, this is only an example of such a program—it's more of the theory and potential usage that I'm after. — bedtime, Mar 17 '19 at 16:10
Good question! I run the command in a loop as simply 'mem()'—without the printf but with usleep(55) in the loop. I use the 'top' command to see how much CPU is used by the program. My last update to the code has brought it down to 44% cpu, from 47%. It involved extracting all the data and then using a personally made function to convert from char * to int. This new find will be brought into many of the other programs I've made. These programs are running on a low-powered Rasberry Pi 3b+ and will be soon running on a 1 core, 1Ghz, 500mb, Raspberry Pi Zero. — bedtime, Mar 18 '19 at 18:08

Read and extract long int data among strings in a fixed column file using file positions?

0 Answers0