1

I have a sample program that takes in an input from the terminal and executes it in a cloned child in a subshell.

#define _GNU_SOURCE
#include <stdlib.h>
#include <sys/wait.h>
#include <sched.h>
#include <unistd.h>
#include <string.h>
#include <signal.h>

int clone_function(void *arg) {
  execl("/bin/sh", "sh", "-c", (char *)arg, (char *)NULL);
}

int main() {
  while (1) {
    char data[512] = {'\0'};
    int n = read(0, data, sizeof(data));
    // fgets(data, 512, stdin);
    // int n = strlen(data);
    if ((strcmp(data, "exit\n") != 0) && n > 1) {
      char *line;
      char *lines = strdup(data);
      while ((line = strsep(&lines, "\n")) != NULL && strcmp(line, "") != 0) {
        void *clone_process_stack = malloc(8192);
        void *stack_top = clone_process_stack + 8192;
        int clone_flags = CLONE_VFORK | CLONE_FS;
        clone(clone_function, stack_top, clone_flags | SIGCHLD, (void *)line);
        int status;
        wait(&status);
        free(clone_process_stack);
      }
    } else {
      exit(0);
    }
  }
  return 0;
}

The above code works in an older Linux system (with minimal RAM( but not in a newer one. Not works means that if I type a simple command like "ls" I don't see the output on the console. But with the older system I see it.

Also, if I run the same code on gdb in debugger mode then I see the output printed onto the console in the newer system as well.

In addition, if I use fgets() instead of read() it works as expected in both systems without an issue.

I have been trying to understand the behavior and I couldn't figure it out. I tried doing an strace. The difference I see is that the wait() return has the output of the ls program in the cases it works and nothing for the cases it does not work.

Only thing I can think of is that read(), since its not a library function has undefined behavior across systems. But I can't agree as to how its affecting the output.

Can someone point me out to why I might be observing this behavior?

EDIT

The code is compiled as:

gcc test.c -o test

strace when it's not working as expected is shown below

enter image description here

strace when it's working as expected (only difference is I added a printf("%d\n", n); following the call for read()) enter image description here

Thank you
Shabir

Community
  • 1
  • 1
Shabirmean
  • 2,341
  • 4
  • 21
  • 34
  • Check your function return, also don't forget to check your function return, and try to check your function return. `while true` this generally don't compile, you are not new to SO, please read again [mcve]. "Only thing I can think of is that read(), since its not a library function has undefined behavior across systems." haha – Stargateur Oct 21 '18 at 00:39
  • I don't understand the comment. What I provided was somewhat like a pseudocode. Do you want me to give complete code snippets? – Shabirmean Oct 21 '18 at 00:45
  • 1
    exactly, but it's must be complete, minimal and verifiable. – Stargateur Oct 21 '18 at 00:46
  • @Stargateur Edited... Whats the "hahahah" about? – Shabirmean Oct 21 '18 at 00:51
  • 1
    `read()` is a library function, it's perfectly defined in posix and don't have any undefined behavior. `read` is a the heart of linux, it's the heart of a computer, it's generally the first instruction and the second is write. There is no way, read is the problem, blame your code before try to find a problem in a library that work since 30 years – Stargateur Oct 21 '18 at 00:56
  • 1
    Dont assume strings. `read()` could return 1. What would happen? – wildplasser Oct 21 '18 at 01:03
  • @Stargateur what I meant by library function is that read() is not standard C and is a system call; as read from another SO thread[1]. Also, I have no issues blaming my code :p. I was just suggesting my thoughts based on what I read. I might be wrong but I thought I must put everything down. So you can calm down. [1] https://stackoverflow.com/questions/6220093/difference-between-read-and-fgets-in-c – Shabirmean Oct 21 '18 at 01:04
  • @wildplasser If it returns 1 then nothing is printed. But when I track with strace and issue "ls -l", I see the clone() function being executed with no return – Shabirmean Oct 21 '18 at 01:10
  • " warning: arithmetic on a pointer to void is a GNU extension [-Wpointer-arith]", "warning: implicit conversion loses integer precision: 'ssize_t' (aka 'long') to 'int' [-Wshorten-64-to-32]", on my machine(linux 4.15.15-1-ARCH) this work, even if I see potentiel bug, your [mcve], is not verifiable. – Stargateur Oct 21 '18 at 01:12
  • `if ((strcmp(data, "exit\n") != 0) && n > 1) { p(data, "exit\n") != 0) && n > 1) {blabla` thereyougo ... – wildplasser Oct 21 '18 at 01:20
  • Mine is a server running [Ubuntu **16.04.5 LTS**, Kernel Version: **4.4.0-79-generic**, gcc version 5.4.0 20160609, 32 cores and 128GB RAM] The other older one is [Debian **GNU/Linux 9.5**, Kernel Version: **3.16.0-4-686-pae**, gcc version 6.3.0, 4 cores with 3GB RAM] – Shabirmean Oct 21 '18 at 01:21
  • 1
    I'd use good old `fork`-`exec` pattern instead of cloning a process. both calls were defined long ago and should work on both systems. To be honest, this is the first time I really met `clone`. – Eir Nym Oct 21 '18 at 01:49
  • @EirNym - True enough. The requirement strictly enforces the usage of clone(). And getting around the problem is quite easy. But I'm trying to make sense out of the behavior observed. What's actually happening... – Shabirmean Oct 21 '18 at 01:55
  • what's about `pipe(2)` or dup(2) and set file descriptors for a child process? – Eir Nym Oct 21 '18 at 02:01
  • 2
    on the program that didn't work, your childs segfaulted twice. – Stargateur Oct 21 '18 at 04:48
  • @Stargateur Are you saying this from the SIGSEGV inside wait()? – Shabirmean Oct 21 '18 at 04:52
  • Please don't post images of text terminals. – n. m. could be an AI Oct 21 '18 at 07:38
  • `SIGSEGV` is inside your *child*. `wait` only reports on it. I was only able to reproduce this on coliru, my local machine runs this code as expected. It looks like `execl` crashes, but I have no idea why. – n. m. could be an AI Oct 21 '18 at 08:35

2 Answers2

4

There are multiple problems in your code:

  • a successful read system call can return any non zero number between 1 and the buffer size depending on the type of handle and available input. It does not stop at newlines like fgets(), so you might get line fragments, multiple lines, or multiple lines and a line fragment.
  • furthermore, if read fills the whole buffer, as it might when reading from a regular file, there is no trailing null terminator, so passing the buffer to string functions has undefined behavior.
  • the test if ((strcmp(data, "exit\n") != 0) && n > 1) { is performed in the wrong order: first test if read was successful, and only then test the buffer contents.
  • you do not set the null terminator after the last byte read by read, relying on buffer initialization, which is wasteful and insufficient if read fills the whole buffer. Instead you should make data one byte longer then the read size argument, and set data[n] = '\0'; if n > 0.

Here are ways to fix the code:

  • using fgets(), you can remove the line splitting code: just remove initial and trailing white space, ignore empty and comment lines, clone and execute the commands.
  • using read(), you could just read one byte at a time, collect these into the buffer until you have a complete line, null terminate the buffer and use the same rudimentary parser as above. This approach mimics fgets(), by-passing the buffering performed by the standard streams: it is quite inefficient but avoids reading from handle 0 past the end of the line, thus leaving pending input available for the child process to read.
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • "There are" infinite "ways to fix the code:", also "you should just read one byte at a time" is really not recommended in any circonstance. – Stargateur Oct 21 '18 at 02:28
  • @chqrlie - Thank you for your response. I do understand the problem and I know I can fix it with **fgets()** (with the two commented lines above). But I'm trying to find some proper explanation to what's happening. With **strace** I observe even if the characters read for "ls\n" is 3, I still don't see the output. When I debug using **gdb**, I see it. If I add a printf() after the read() call then it works. – Shabirmean Oct 21 '18 at 02:47
  • @Stargateur: there are circumstances where buffering input might not be appropriate :) – chqrlie Oct 21 '18 at 02:47
  • @n.m.: `data` is initialized to `0` and the OP only types 3 bytes, so the behavior is strange... I was wondering if `read` may change the destination array beyond the count of bytes it returns? `fgets()` is not supposed to do that, although it would not matter because it does null terminate the destination array. – chqrlie Oct 21 '18 at 03:10
  • @chqrlie yes missed that initialisation. – n. m. could be an AI Oct 21 '18 at 03:17
  • @chqrlie personally, I just doubt that this code is the real code that cause problem. – Stargateur Oct 21 '18 at 03:33
  • This is the code. I copy-pasted your edited source @Stargateur and re-produced it just to be sure. – Shabirmean Oct 21 '18 at 03:44
  • @Shabirmean: did you try fixing the null terminator issue? – chqrlie Oct 21 '18 at 03:47
  • @chqrlie: strsep() will add a NULL to the end anyway, replacing the '\n'. I also debugged to see the contents of "line" inside the loop and its what I want without the newline. – Shabirmean Oct 21 '18 at 03:52
  • @Shabirmean you still didn't do what I said "Check your function return, also don't forget to check your function return, and try to check your function return." `perror()` is your friend – Stargateur Oct 21 '18 at 03:54
  • @Stargateur you mean the return on clone()? – Shabirmean Oct 21 '18 at 03:57
  • @Shabirmean yes, and `strdup()` and `malloc()` and `execl()` and `wait()`, you really need to stop ignore it, error are here to be handle. Here you assume all is good, that a terrible way to coding. – Stargateur Oct 21 '18 at 03:59
  • @Stargateur my dear friend, this is a code of a student I am grading. I know it's not written with the right practices. Also, I have verified that the code gets to where it needs to get. **strace** shows that the clone is actually executed. I know how to get the code working and also do understand that it has issues with how error checking. I am trying to find a logical reason for the behavior I am observing. Also I just put error checking for all of these and the same behavior. I will update the question with screenshots of the **strace** for both cases. – Shabirmean Oct 21 '18 at 04:14
  • @Shabirmean I can't guess that it's not your code, also as I added myself the include lines verify that the behavior is still buged with the **actual code in your question** or update it as it follow your exact case (so don't forget the include !!!). clone is executed but is it successful ? Also asking the behavior of undefined behavior is pointless. and as you ask for why, include how you compile this code. – Stargateur Oct 21 '18 at 04:19
1

It looks like 8192 is simply too small a value for stack size on a modern system. execl needs more than that, so you are hitting a stack overflow. Increase the value to 32768 or so and everything should start working again.

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243