1

For example, I code :

fp = popen("wc -l < myfile", "r");

But myfile should be any file's name which is parsed to this project. It could be file abc.txt or 123.txt or xy.txt etc.

Then I want to get the output of executing this wc -l < myfile. But the problem is that I don't know which function in C can help me to parse the name of the myfile to this shell command and I can also get the output. Can anyone gives me some suggestions?

Edit: The file I want to read is very large. I want to read its data into an array.I cannot use list to store it, because it is too slow to locate a specific data in list. The problem is that if I use one dimensional array to malloc() memory space to the array, there is no enough continuous memory space on the laptop. Therefore, I plan to use two dimensional array to store it. So I have to get the num of lines in the file and then decide the size of each dimensional in this array via log.

Thanks for all answers. This project is about reading two files. The first file is much larger than the second file. The second file is like:

1   13  0
2   414 1
3   10  0
4   223 1
5   2   0

The third num in each line is called "ID". For example, num "1" has ID 0, num "2" has ID 1, num "3" has ID "0". (Ignore the middle num in each line) And the first file is like:

1   1217907
1   1217908
1   1517737
1   2
2   3
2   4
3   5
3   6

If each num in the first file has the ID "0", I should store the both of num in each line into an data structure array. For example, we can see that num "1" has ID "0" in second file, so I need to store:

1   1217907
1   1217908
1   1517737
1   2

from my first file into the data structure array. The num "2" has ID"1" but num "3" has ID "0" and num "4" has ID "1", so need to store : 2 3 but not store 2 4 from my first file. That's why I need use array to store the two files. If I use two arrays to store them, I can check whether this num's ID is "0" fast in the array belongs to second file because using array is fast to locate a specific data, the index can be the value of the num directly.

beasone
  • 1,073
  • 1
  • 14
  • 32
  • "Not enough contiguous memory space" doesn't really make sense. Modern operating systems use virtual memory space which is much, much larger than physical memory. Even if you don't have enough physical memory, that shouldn't matter. – Charles Duffy May 22 '15 at 21:58
  • Even though the file is 20G? – beasone May 22 '15 at 21:59
  • Most of my classmates failed on allocating enough contiguous memory space. They told me their program crashed on the computer which is used to test their programs. They said that computer didn't have enough memory space because it is old...... – beasone May 22 '15 at 22:02
  • If it's a 32-bit platform, then I can see that. – Charles Duffy May 22 '15 at 22:07
  • This time they should use 64-bit platform to test our program, but in the program we need about 8 arrays to store the data, each array stores different kinds of data. – beasone May 22 '15 at 22:33
  • is your question about passing a parameter to 'system()'? or is it about searching a large file? – user3629249 May 23 '15 at 22:22
  • passing a parameter to 'system()' is one step of searching a large file in my program. – beasone May 24 '15 at 05:18
  • Possible duplicate of [Passing variable from c program to shell script as argument](https://stackoverflow.com/q/18179346/608639) – jww Aug 19 '19 at 13:54

4 Answers4

2

I think, you need to make use of snprintf() to generate the string to be passed to popen() first and then you can call popen() with that string.

Pseudo-code

char buf[32] = {0};
snprintf(buf, 32, "wc -l < %s", myfile);
fp = popen(buf, "r");

EDIT

To make it work for any length of myfile

int len = strlen(myfile) + strlen("wc -l < ") + 1;
char *buf = malloc(len);
snprintf(buf, len, "wc -l < %s", myfile);
fp = popen(buf, "r");

...

free(buf);

Note: As mentioned by Ed Heal in the comment, the 32 here is used here for just demo purpose. You should choose your temporary array length based on the length of the string held by myfile, plus the mandatory characters, plus null terminator, obviously.

Community
  • 1
  • 1
Natasha Dutta
  • 3,242
  • 21
  • 23
  • How long is `myFile` - Hope it is not too long – Ed Heal May 22 '15 at 21:17
  • @NatashaDutta - Why use a shell etc. to do this? Seems a lot of overhead – Ed Heal May 22 '15 at 21:20
  • @EdHeal you're very right, but then, OP wants to work with `popen()`, so thought of showing the way. Your advice is real, no doubt. :) – Natasha Dutta May 22 '15 at 21:23
  • 2
    From a security perspective, this is very bad news; there's absolutely no attempt to protect against shell command injection. Consider a filename with `$(rm -rf /)` inside. – Charles Duffy May 22 '15 at 21:33
  • The reason why I use shell to get the num of lines is that I want to read a very big file, using "wc -l" is much faster than using getline() or getc() in C. – beasone May 22 '15 at 21:35
  • @CharlesDuffy Thanks for the comment. I'm not sure I understood you fully. can you point me towards some link where I can read more on this? – Natasha Dutta May 22 '15 at 21:35
  • 1
    Really - the source code is here - https://www.gnu.org/software/cflow/manual/html_node/Source-of-wc-command.html - mine is slightly quicker as I avoid looking for words – Ed Heal May 22 '15 at 21:40
  • Indeed; reading the source to GNU `wc`, I have to agree that the OP's claim that it somehow outperforms your implementation must be faulty. – Charles Duffy May 22 '15 at 21:41
  • Ed, sorry? I'm agreeing with you. – Charles Duffy May 22 '15 at 21:49
2

Forget popen - do it yourself

i.e.

FILE *f = fopen(argv[1], "r");
int lines = 0;
int ch;
while ((ch = fgetc(f)) != EOF) {
   if (c == '\n') lines++;
}

EDIT - As the poster wants to load the whole file into memory

Add the checking for errors

FILE *f = fopen(argv[1], "r");
struct stat size;
fstat(fileno(f), &size);

char buf = malloc(size.st_size)
fread(buf, size.st_size, 1, f);
fclose(f);
Ed Heal
  • 59,252
  • 17
  • 87
  • 127
  • This is the correct solution to the example problem, but not for the general problem. – jxh May 22 '15 at 21:15
  • What is the general problem - I thought it was reading the number lines in a file - why spawn a shell etc. – Ed Heal May 22 '15 at 21:16
  • The general problem is sending an arbitrary file name to a shell command invoked by `popen()`. The command need not be `wc -l <`. – jxh May 22 '15 at 21:18
  • The file I want to read is very large, so I have to use shell command to read it fast to get its line num. – beasone May 22 '15 at 21:21
  • @beasone: So the *For example, ...* is actually *This is what I want to do...* ? – jxh May 22 '15 at 21:22
  • @beasone - What do you think that the shell command does that is quicker? It has to read the file – Ed Heal May 22 '15 at 21:22
  • It is much faster than using your method to read the file, I test it. – beasone May 22 '15 at 21:25
  • .. really forking another process, starting a shell program using pipes?! Or just read a file – Ed Heal May 22 '15 at 21:29
  • I need to read the file into an array, so I have to know how many lines in the file so that I can malloc the enough memory to the array. – beasone May 22 '15 at 21:36
  • @beasone, the usual way to do that is using `realloc()` to double the size of your buffer every time it fills up; then, you can read even from things like pipes that aren't seekable and **can't** be read more than once (as you're trying to do here, once with `wc` and a second time with your program). – Charles Duffy May 22 '15 at 21:39
  • 1
    @beasone - You have changed the problem from counting the number of lines to allocating space – Ed Heal May 22 '15 at 21:41
  • @CharlesDuffy I cannot use realloc() to double it. The file is too large, I cannot malloc some much continuous memory space to it in one dimensional array. I plan to get the num of lines of this file, and based on its value to establish a two dimensional array. – beasone May 22 '15 at 21:46
  • So you realloc() the first-order array, and malloc() new rows for the second-order array. Or you use a different data structure that's a better fit -- there are **lots** of good solutions for this, and you'd probably be getting them as suggestions if you'd asked the right question, instead of asking a question that assumed a bad answer ("bad" == precluding your program being used to stream non-seekable input and wasting IO resources) to that first question. – Charles Duffy May 22 '15 at 21:48
  • @CharlesDuffy - Please do not just double it - exponential growth. Gets very large quickly – Ed Heal May 22 '15 at 22:00
  • It does indeed, but since we're using virtual memory rather than physical memory, there's a lot of room to be wasteful; if nobody actually writes to the pages, there's no physmem allocation, so using (worst-case) twice the virtual memory we have physical memory actually needed isn't so bad. – Charles Duffy May 22 '15 at 22:00
2

If you're not going to do this yourself (without a shell), which you should, at least pass the filename in such a way that the shell will only ever interpret it as data rather than code to avoid potential for security incidents.

setenv("filename", "myfile");            /* put filename in the environment */
fp = popen("wc -l <\"$filename\"", "r"); /* check it from your shell script */
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Are we assuming that `wc` does what we think it does. Consider it might be a shell script that it earlier in the path?! – Ed Heal May 22 '15 at 21:58
  • This is clever and upvoted, but it is not reentrant. – jxh May 22 '15 at 22:00
  • @jxh, indeed; in an ideal world, one would do the `setenv()` inside the subprocess after forking and before exec'ing the shell, thus isolating it from the rest of the process. Shame that `popen()` isn't designed with the use case in mind (contrary to your `execv*` series of calls, which _do_ allow explicit environment variables to be passed in). – Charles Duffy May 22 '15 at 22:03
2

All of the code below is untested. If I find time to test, I'll remove this caveat.

You can create your own wrapper to popen() to allow you to form an arbitrary command.

FILE * my_popen (const char *mode, const char *fmt, ...) {
    va_list ap;
    int result = 511;

    for (;;) {
        char buf[result+1];

        va_start(ap, fmt);
        result = vsnprintf(buf, sizeof(buf), fmt, ap);
        va_end(ap);

        if (result < 0) return NULL;
        if (result < sizeof(buf)) return popen(buf, mode);
    }

    /* NOT REACHED */
    return NULL;
}

Then, you can call it like this:

const char *filename = get_filename_from_input();
FILE *fp = my_popen("r", "%s < %s", "wc -l", filename);
if (fp) {
  /* ... */
  pclose(fp); /* make sure to call pclose() when you are done */
}

Here, we assume that get_filename_from_input() transforms the filename input string into something safe for the shell to consume.


It is rather complex (and error prone) to reliably fix up a filename into something the shell will treat safely. It is more safe to open the file yourself. However, after doing so, you can feed the file to a command, and then read out the resulting output. The problem is, you cannot use popen() to accomplish this, as standard popen() only supports unidirectional communication.

Some variations of popen() exist that support bidirectional communication.

FILE * my_cmd_open (const char *cmd) {
    int s[2], p, status, e;
    if (socketpair(AF_UNIX, SOCK_STREAM, 0, s) < 0) return NULL;
    switch (p = fork()) {
    case -1: e = errno; close(s[0]); close(s[1]); errno = e; return NULL;
    case 0: close(s[0]); dup2(s[1], 0); dup2(s[1], 1); dup2(s[1], 2);
            switch (fork()) {
            case -1: exit(EXIT_FAILURE);
            case 0: execl("/bin/sh", "-sh", "-c", cmd, (void *)NULL);
                    exit(EXIT_FAILURE);
            default: exit(0);
            }
    default: for (;;) {
                 if (waitpid(p, &status, 0) < 0 && errno == EINTR) continue;
                 if (WIFEXITED(status) && WEXITSTATUS(status) == 0) break;
                 close(s[0]); close(s[1]); errno = EPIPE;
                 return NULL;
             }
    }
    close(s[1]);
    return fdopen(s[0], "r+");
}

To efficiently read an entire file into memory, you can use mmap().

void * mmap_filename (const char *filename, size_t *sz) {
    int fd = open(filename, O_RDONLY);
    if (fd < 0) return NULL;
    struct stat st;
    if (fstat(fd, &st) < 0) {
        close(fd);
        return NULL;
    }
    *sz = st.st_size;
    void *data = mmap(NULL, *sz, PROT_READ, MAP_PRIVATE, fd, 0);
    close(fd);
    return data != MAP_FAILED ? data : NULL;
}

Then, you can call it like this:

size_t sz;
void *data = mmap_filename(filename, &sz);
if (data) {
    /* ... */
    munmap(data, sz);
}

The example code above maps the entire file at once. However, the mmap() API allows you to map portions of the file from a particular offset into the file.

Community
  • 1
  • 1
jxh
  • 69,070
  • 8
  • 110
  • 193