3

Let me start with saying that I'm around 3 days old in C++.

Ok to the main question, I have a file that spans multiple lines, and I'm trying to print one specific line repeatedly, which is subject to change arbitrarily by some other process.

Example file :

line0
line1
somevar: someval
line3
line4

I'm trying to print the middle line (one that starts with somevar). My first naive attempt was the following where I open the file, loop through the contents and print the exact line, then move to the beginning of the file.

#include <iostream>
#include <fstream>
#include <string>

int main (int argc, char *argv[])
{
    std::string file = "input.txt";
    std::ifstream io {file};

    if (!io){
        std::cerr << "Error opening file" <<std::endl;
        return EXIT_FAILURE;
    }

    std::string line;
    std::size_t pos;
    
    while (getline (io, line))
    {
        pos = line.find_first_of(' ');
        if (line.substr (0, pos) == "somevar:")
        {
            // someval is expected to be an integer
            std::cout << std::stoi( line.substr (pos) ) ) << std::endl;
            io.seekg (0, std::ios::beg);
        }
    }

    io.close();

    return EXIT_SUCCESS;
}

Result : Whenever the file's updated, the program exits.

I came to think the fact that the IO I'm performing is actually buffered, therefore updating the file shouldn't reflect in our existing buffer just like that (this isn't shell scripting). So now I thought let's open and close the file on each iteration, which should effectively refresh the buffer every time, I know not the best solution, but I wanted to test the theory. Here's the new source :

#include <iostream>
#include <fstream>
#include <string>

int main (int argc, char *argv[])
{
    std::string proc_file = "input.txt";
    std::ifstream io;

    if (!io){
        std::cerr << "Error opening file" <<std::endl;
        return EXIT_FAILURE;
    }
    std::string line;
    std::size_t pos;
    
    while (io.open(proc_file, std::ios::in), io)
    {
        io.sync();
        getline (io, line); 
        pos = line.find_first_of(' ');
        // The line starting with "somevar:" is always going to be there.
        if (line.substr (0, pos) == "somevar:")
        {
            std::cout << std::stoi( line.substr (pos) ) ) << std::endl;
            io.close();
        }
    }

    io.close();

    return EXIT_SUCCESS;
}

Result : Same as before.

What would be the ideal way of achieving what I'm trying to? Also, why's the program exiting whenever the file in question is being updated? Thanks (:

EDIT: The file I'm trying to read is "/proc/" + std::to_string( getpid() ) + "/io", and the line is the bytes read one (starts with read_bytes:).

debdutdeb
  • 133
  • 7
  • 1
    The buffering is likely happening on the writer's side as well -- i.e. the actual contents of the file (in the filesystem's caches and on the hard drive itself) are not guaranteed to be updated until the writing program's buffers are flushed and/or it closes its file-handle. Is there any chance of you being able to modify the writing program to use some other mechanism (such as a TCP socket, a pipe, or a memory-mapped file) instead? Trying to use a file as an inter-process communications mechanism is rather iffy. – Jeremy Friesner Feb 14 '21 at 07:04
  • @JeremyFriesner It's not an IPC setup, but rather a monitoring attempt. The process changing the file is out of my control. The tests that I've done are through a simple text editor (changing the value). – debdutdeb Feb 14 '21 at 07:07
  • 1
    Seems like a good case to use [sqlite](https://sqlite.org/). **Your question is operating system specific**. If on Linux, tag your question with `Linux`. In that case read [inode(7)](https://man7.org/linux/man-pages/man7/inode.7.html) and [inotify(7)](https://man7.org/linux/man-pages/man7/inotify.7.html) – Basile Starynkevitch Feb 14 '21 at 07:08
  • @BasileStarynkevitch It's a rather small job .. don't want to use a database ): And yes it's on Linux, sorry about the tag. Added. – debdutdeb Feb 14 '21 at 07:11
  • @BasileStarynkevitch hmm .. will follow up. – debdutdeb Feb 14 '21 at 07:11
  • 1
    Notice that lines don't really exist in files. Can the line `somevar: someval` be replaced with `somevar: somemuchmorelongerval`? Why don't you use [JSON](http://json.org/) related libraries? Are you forbidden to use existing libraries? Or existing C++ parser code generators (e.g. [ANTLR](http://antlr.org/) or [GNU bison](https://www.gnu.org/software/bison/)...). I also don't think it is a rather small job. You could use [mmap(2)](https://man7.org/linux/man-pages/man2/mmap.2.html) if the file stays small (less than a few gigabytes) – Basile Starynkevitch Feb 14 '21 at 07:13
  • 1
    My feeling is that you'll need several weeks of full-time work. Perhaps discussing with the partner/client providing the changing file could be worthwhile. Otherwise, I recommend using directly [syscalls(2)](https://man7.org/linux/man-pages/man2/syscalls.2.html) and avoiding C++ streams. On Linux, lines are just a convention related to `\n` (newline) characters in files! – Basile Starynkevitch Feb 14 '21 at 07:20
  • 1
    See also [this question](https://stackoverflow.com/q/66170556/841108) – Basile Starynkevitch Feb 14 '21 at 07:46
  • @BasileStarynkevitch Ok, I should've cleared the air from the beginning. Forgive me. The file I'm trying to read is **very** small. It's simply `/proc/getpid()/io`, and the line is the bytes read one. So no, no external libraries unfortunately. – debdutdeb Feb 14 '21 at 09:19
  • Edited the question accordingly. – debdutdeb Feb 14 '21 at 09:21

2 Answers2

3

As discovered in the comments, you are not reading a "real" file on disk, but rather /proc/PID/io which is a virtual file whose contents may only be determined when it is opened, thanks to VFS. Your statement that it can "change arbitrarily by some other process" is misleading, the file never changes, it simply has different content each time it is opened.

So now we know that no amount of seeking will help. We simply need to open the file afresh each time we want to read it. That can be done fairly simply:

char content[1000]; // choose a suitable value
const char key[] = "read_bytes:";
while (true)
{
    std::ifstream io(io_filename);
    if (!io.read(content, sizeof(content)))
        break;
    auto it = std::search(content, std::end(content), key, key + strlen(key));
    std::cout << atoi(it + strlen(key)) << std::endl;
}

You should do something more careful than atoi() which won't stop at the end of the array, but I assume your real application will do something else there so I elided handling that.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • While it is true that I'm trying to handle a file at `/proc`, the question was totally based on a general file on disk, so the title wasn't exactly misleading. With the `/proc` info yes the answer seems to be more straightforward. Thanks (: – debdutdeb Feb 14 '21 at 10:24
2

The file I'm trying to read is some /proc/1234/io

That is the most important information.

Files in proc(5) are small pseudo-files (a bit like pipe(7)-s) which can only be read in sequence.

That pseudo file is not updated, but entirely regenerated (by the Linux kernel whose source code you can study) at every open(2)

So you just read all the file quickly in memory, and process that content in memory once you have read it.

See this answer to a very related question.... Adapt it to C++

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547