0

How is it possible that Cygwin seemingly manages to bypass the MS C Runtime library enabling a C program to get its argv like a Linux machine would?

I'll explain what I mean.

On Windows I understand that a C program has the choice of calling GetCommandLine() or of using argv.

And I understand that a windows implementation of C compiler would make C programs implicitly call the MS C Runtime Library, which will take the command line (perhaps outputted by GetCommandLine()), that isn't separated into arguments, and it'll take that as input and parse it, putting it into argv. This link mentions about that https://learn.microsoft.com/en-us/cpp/c-language/parsing-c-command-line-arguments?view=msvc-170

And from what I understand, on Linux, what's written after the command at the command line, goes straight from the shell to argv. No external library doing the parsing. The shell calls a POSIX function called execv and figures out what the arguments are and passes them to execv which passes them to the program's argv.

I use these programs for some tests

C:\blah>type w.c
#include <stdio.h>
#include <windows.h>

int main(int argc, char *argv[]) {
    printf(GetCommandLine());
    return 0;
}


C:\blah>w.exe  "asdf" erw
w.exe   "asdf" erw
C:\blah>


C:\blah>type w2.c
#include <stdio.h>

int main(int argc, char *argv[]) {
        int i = 0;
        while (argv[i]) {
                printf("argv[%d] = %s\n", i, argv[i]);
                i++;
        }
        return 0;
}

C:\blah>w2 abc "def"
argv[0] = w2
argv[1] = abc
argv[2] = def

C:\blah>

And w2.c can be run from linux too

root@ubuntu:~# ./w2 abc "def"
argv[0] = ./w2
argv[1] = abc
argv[2] = def
root@ubuntu:~#

I notice that there are some cases where the MS C Runtime gives a different parsing, to Linux. (Linux of course wouldn't be using the MS C Runtime)

For example, this link https://learn.microsoft.com/en-us/cpp/c-language/parsing-c-command-line-arguments?view=msvc-170 mentions this command line input a\\\b d"e f"g h and expected outputs.

C:\blah>w2 a\\\b d"e f"g h
argv[0] = w2
argv[1] = a\\\b
argv[2] = de fg
argv[3] = h

C:\blah>

Whereas on Linux, one gets

root@ubuntu:~# ./w2 a\\\b d"e f"g h
argv[0] = ./w2
argv[1] = a\b
argv[2] = de fg
argv[3] = h

So now the interesting test was, what would Cygwin do

user@comp /cygdrive/c/blah
$ ./w2 a\\\b d"e f"g h
argv[0] = C:\blah\w2.exe
argv[1] = a\b
argv[2] = de fg
argv[3] = h

Cygwin manages to get the result that a linux machine would give.

But it's running an EXE file that was compiled on Windows and that i'd have thought must be using the MS C Runtime library. And when running the EXE file from CMD outside cygwin, then it does look like it's using the MS C Runtime Library. So how is Cygwin seemingly managing to bypass that to lead the program to give the result that a linux machine would give.

How is this possible?! What is going on?!

barlop
  • 12,887
  • 8
  • 80
  • 109
  • Not sure I understand the question. Command line is just text in shell. The shell can do what ever it wants with it, before passing it to the OS. OS has no way to forcibly get the string typed in the cygwin shell command line, it only gets what Cygwin chooses to give it. – hyde Aug 14 '22 at 16:49
  • cygwin has its own C and own C runtime. – stark Aug 14 '22 at 17:45
  • 1
    @stark The w2.exe I ran on Windows cmd, and on Cygwin, are the same one though. – barlop Aug 14 '22 at 17:48
  • @hyde Well, the result of GetCommandLine() isn't necessarily exactly the text in a shell. Try `C:\blah>calc ^gg` then do `wmic process where caption="calc.exe" get commandline | findstr calc` it prints `calc gg` Also, there is the issue of what happens to get "the command line" or the result of GetCommandLine() into argv. Cygwin can't stop it from using the MS C Runtime. So how is Cygwin getting what Linux would get, to be in the argv? – barlop Aug 14 '22 at 17:53
  • Ah. `main` is not called by Windows, it is not the entry point of the executable. The runtime does things, then calls `main`, so runtime is able to completely control the parameters of `main`. – hyde Aug 14 '22 at 18:00
  • @hyde well, when you say "The runtime", do you mean the MS C Runtime? And if it runs the MS C Runtime, why is the output like that of a linux machine and not like that of a Windows machine's CMD without cygwin? – barlop Aug 14 '22 at 18:02
  • By runtime I mean what ever code contained in the .exe, which is run before (and after) `main`. I repeat, the OS does not call `main`. – hyde Aug 14 '22 at 18:06
  • @hyde well I never even mentioned the word main(other than writing int main in code), I never said the OS calls main!! What did I write(that may be incorrect), that made you think that I had that idea? – barlop Aug 14 '22 at 18:21
  • @hyde And am I correct in thinking that the shell passes the command line or some function of it, to the Runtime, which puts the command line into argv? – barlop Aug 14 '22 at 18:46
  • If I understand correctly, you are wondering how `main` parameters can be different from what `GetCommandLine()` returns. – hyde Aug 14 '22 at 19:57
  • 2
    @barlop: `argv` is specific to `main`, so when you refer to `argv`, you are indirectly referring to `main`. – Andreas Wenzel Aug 14 '22 at 19:58
  • 2
    @barlop: The shell passes the command line to the operating system, probably by calling [`CreateProcess`](https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createprocessa) (one of arguments of that function is the command line). The operating sytem then creates a new process, which causes the C run-time library to take control. The run-time library will probably call the Windows API function `GetCommandLine` and will use the returned information to set `argc` and `argv`, before it calls `main`. – Andreas Wenzel Aug 14 '22 at 19:59
  • 2
    Consider that the shell (bash?) in Cygwin does its own parsing of the command line before any Windows function is called to launch the application. Since this shell is more compatible to a Linux shell, I'd expect the same outcome, in contrast to the parsing of CMD. – the busybee Aug 14 '22 at 20:06

1 Answers1

0

I conversed with somebody that knows about cygwin.. They said that cygwin can detect whether an executable is a windows executable, or a cygwin executable. And the ldd command can do so. And a cygwin executable will be linked to cygwin1.dll. A program like sysinternals process explorer can show what DLLs are linked to a running process e.g. it shows that bash.exe is linked to cygwin1.dll. But the ldd command is more useful here as it shows also for commands that aren't kept open. $ ldd /bin/bash.exe showed some NT related DLLs, but also cygwin1.dll. Whereas $ldd ./w.exe showed just NT related Dlls, no cygwin1.dll.

And they said that this file winsup/cygwin/winf.cc is very relevant to that. I have it on my system https://gist.github.com/gartha1/4a2871b7f22ef85b5c8c0b08674b6f57 I see it has stuff about argv

Some comments and C guys I conversed with have indicated to me, and from my understanding of what they said, that Linux has some compiler specific C runtime libraries. And when people say C runtime libraries, they tend to mean to include also POSIX functions like execv, that is technically not part of the C standard, but part of the POSIX standard. And the runtime libraries apply before main starts and after end finishes.

I was looking at it from the point of view is, this is the command line, i've typed, and what is then sent to argv, and how. But another perspective is, looking at what's sent to argv, and taking a step back and what's the value at GetCommandLine() that'd produce that. And also I think, looking at the command line typed, and seeing what it sends or would send to GetCommandLine().

The MS C Runtime, starts with GetCommandLine() then calls GetCommandLineToArgs() https://learn.microsoft.com/en-us/windows/win32/api/processenv/nf-processenv-getcommandlinea "GetCommandLine as an alias which automatically selects the ANSI or Unicode version of this function" and "To convert the command line to an argv style array of strings, pass the result from GetCommandLineA to CommandLineToArgW." What is the difference between the `A` and `W` functions in the Win32 API? "The A functions use Ansi (not ASCII) strings as input and output, and the W functions use Unicode string instead .."

So, what MS C Runtime sees when it does GetCommandLine() is very significant. And I think Cygwin's linux shell e.g. bash , does its parsing.. which is described by info bash and includes "word splitting"(separating arguments), and quote removal.

calc.exe is useful because it stays open so I can look at the command line with WMIC. That's clearer than using w.exe(from cmd), to determine what the command line is.

To use some simple examples with calc.exe trying calling it with command line of

  1. calc "abc"
  2. calc a\a

In the case of calc "abc", what gets into argv in Cygwin and plain cmd is the same. And what gets seen by GetCommandLine() won't really need any adjustment, though cygwin sanitises what it makes available to GetCommandLine() a bit.

Looking with CMD, we see

C:\>w calc abc
w  calc abc

C:\>w calc "abc"
w  calc "abc"


C:\>w2 calc abc
argv[0] = w2
argv[1] = calc
argv[2] = abc

C:\>w2 calc "abc"
argv[0] = w2
argv[1] = calc
argv[2] = abc

So a value from GetCommandLine() of calc "abc" or calc abc are equivalent

I'm using wmiccalc.exe which runs the line wmic process where caption="calc.exe" get commandline | calc "abc"

C:\>calc "abc" <ENTER>

C:\>wmiccalc.bat<ENTER>
calc  "abc"

Now see what happens if I run calc from cygwin, what the command line is

$ calc "abc" &

$ ./wmiccalc.bat
C:\Windows\System32\calc.exe abc

It is using a slightly sanitised command line that won't change anything in terms of what is sent to argv(what it gives the runtime to send to argv), relative to what the pure cmd call of calc.exe will end up (via the runtime), sending through to argv.

In both cases it'd be the MS C Runtime. That gets run.

What Cygwin did was it took the "abc" and said, well, bash will want abc in argv, so it constructed a command line that (when sent through the MS C Runtime), would/will send abc to argv.

Now let's look at this example

2. calc a\a

This is slightly different to the first example. 'cos not just what is sent (via the MS C runtime), to argv in the cygwin case and the cmd case are different..

What is produced by the MS C Runtime, is different.

Cygwin sends what it wants to send, to produce the output that bash wants produced.

C:\>calc a\a

C:\>wmiccalc.bat
calc  a\a

From Windows, that's the command line

And from that command line, The MS C Runtime will send the following to argv

>w2 a\a
argv[0] = w2
argv[1] = a\a

If though an executable in linux gets a command line like a\a , it treats the backslash as an escape character.. so it wouldn't have a\a going to an argv.

$ echo a\a
aa

So if I do

$ calc a\a &

$ ./wmiccalc.bat
C:\Windows\System32\calc.exe aa

So cygwin will use a very different command line.. a command line of aa not a\a

$ ./w2 a\a
argv[0] = ......\w2.exe
argv[1] = aa

And that makes sense, because if we look at CMD, a command line a\a gets what we'd want if having that command line on windows.

>w2 a\a
argv[0] = w2
argv[1] = a\a

>

Whereas a command line of aa i.e. the MS C Runtime seeing a GetCommandLine() result of aa, gets what we'd want in argv if running it from linux or bash

>w2 aa
argv[0] = w2
argv[1] = aa

>

So, if you run the executable on windows plain CMD not cygwin, you get what it should show for Windows.

And if you run the executable from Cygwin, cygwin's shell e.g. bash shell, parses it constructs the windows call to the program so that it gives MS C Runtime the command line so that Ms C Runtime will put the right things into argv to give what a linux machine would show. So it's not bypassing MS C Runtime. It's using it cleverly. It's saying "Having parsed the output given to me by the linux shell e.g. bash, I know what argv values I want, so i'll put together a command line that takes into account how MS C Runtime parses things, so as to get the argv values I want"

By the way

One of the comments corrects one of the things I wrote in my question.. I wrote

And from what I understand, on Linux, what's written after the command at the command line, goes straight from the shell to argv. No external library doing the parsing. The shell calls a POSIX function called execv and figures out what the arguments are and passes them to execv which passes them to the program's argv.

But actually, there's a C Runtime used by compilers on linux.. The POSIX function execv would be considered to be part of that. If somebody didn't want to call it C Runtime, they could call it C/POSIX runtime.

Also some comments to the question helped correct some misconceptions in areas of lack of clarity in the question e.g.

To the question of "And am I correct in thinking that the shell passes the command line or some function of it, to the Runtime, which puts the command line into argv?"

this comment explained how what the shell wants the arguments to be, will eventually get to main(And thus argv). Never going straight there, and not even from the shell straight to the runtime.. From shell to OS to runtime.

" @barlop: The shell passes the command line to the operating system, probably by calling CreateProcess (one of arguments of that function is the command line). The operating sytem then creates a new process, which causes the C run-time library to take control. The run-time library will probably call the Windows API function GetCommandLine and will use the returned information to set argc and argv, before it calls main. – Andreas Wenzel"

Consider that the shell (bash?) in Cygwin does its own parsing of the command line before any Windows function is called to launch the application. Since this shell is more compatible to a Linux shell, I'd expect the same outcome, in contrast to the parsing of CMD. – the busybee "

Anyhow, I think this addresses what is happening.. How the command line typed into cygwin is transformed to a string seen by GetCommandLine() and gets the result using the MS C Runtime library.

I used two simple examples but they would explain it for the case given in the question too.

barlop
  • 12,887
  • 8
  • 80
  • 109