6

I know about Cygwin, and I know of its shortcomings. I also know about the slowness of fork, but not why on Earth it's not possible to work around that. I also know Cygwin requires a DLL. I also understand POSIX defines a whole environment (shell, etc...), that's not really what I care about here.

My question is asking if there is another way to tackle the problem. I see more and more of POSIX functionality being implemented by the MinGW projects, but there's no complete solution providing a full-blown (comparable to Linux/Mac/BSD implementation status) POSIX functionality.

The question really boils down to: Can the Win32 API (as of MSVC20??) be efficiently used to provide a complete POSIX layer over the Windows API?

Perhaps this will turn out to be a full libc that only taps into the OS library for low-level things like filesystem access, threads, and process control. But I don't know exactly what else POSIX consists of. I doubt a library can turn Win32 into a POSIX compliant entiity.

rubenvb
  • 74,642
  • 33
  • 187
  • 332

3 Answers3

6

POSIX <> Win32.

If you're trying to write apps that target POSIX, why are you not using some variant of *N*X? If you prefer to run Windows, you can run Linux/BSD/whatever inside Hyper-V/VMWare/Parallels/VirtualBox on your PC/laptop/etc.

Windows used to have a POSIX compliant environment that ran alongside the Win32 subsystem, but was discontinued after NT4 due to lack of demand. Microsoft bought Interix and released Services For Unix (SFU). While it's still available for download, SFU 3.5 is now deprecated and no longer developed or supported.

As to why fork is so slow, you need to understand that fork isn't just "Create a new process", it's "create a new process (itself an expensive operation) which is a duplicate of the calling process along with all memory".

In *N*X, the forked process is mapped to the same memory pages as the parent (i.e. is pretty quick) and is only given new pages as and when the forked process tried to modify any shared pages. This is known as copy on write. This is largely achievable because in UNIX, there is no hard barrier between the parent and forked processes.

In NT, on the other hand, all processes are separated by a barrier enforced by CPU hardware. In NT, the easiest way to spawn a parallel activity which has access to your process' memory and resources, is to create a thread. Threads run within the memory space of the creating process and have access to all of the process' memory and resources.

You can also share data between processes via various forms of IPC, RPC, Named Pipes, mailslots, memory-mapped files but each technique has its own complexities, performance characteristics, etc. Read this for more details.

Because it tries to mimic UNIX, CygWin's 'fork' operation creates a new child process (in its own isolated memory space) and has to duplicate every page of memory in the parent process within the newly forked child. This can be a very costly operation.

Again, if you want to write POSIX code, do so in *N*X, not NT.

Rich Turner
  • 10,800
  • 1
  • 51
  • 68
  • +1, but I don't think you're right about the relationship between parent and subprocesses being fundamentally different in the UNIX and Windows kernels. There's no inherent reason AFAIK that Windows couldn't provide fork() functionality, it's just that it would complicate process handling and has never been considered particularly useful. – Harry Johnston Nov 19 '11 at 05:59
  • Come to think of it, the Windows kernel used to support fork(), since it was available to the POSIX subsystem. – Harry Johnston Nov 19 '11 at 06:00
  • Windows & UNIX's process models are entirely different, resulting in some impedance mis-match between the two. fork() is a UNIX system call, and is only available in the old POSIX subsystem which was replaced by Interix/SFU (Services For UNIX). fork() is not available in Win32 - use CreateProcess() instead. Read this section of the Interix developer's guide for more details: http://technet.microsoft.com/en-us/library/bb497007.aspx (Especially the section titled "Creating a New Process"). – Rich Turner Dec 14 '11 at 00:11
  • Further: One of the biggest difference between UNIX' fork() and Win32's CreateProcess() is that fork() creates a process that is (essentially) a duplicate of the caller. CreateProcess() creates a new isolated process that loads and runs a specified exe which may be (and is usually) different from the code running in the parent process. – Rich Turner Dec 14 '11 at 00:14
  • Certainly the process models are different, but I don't believe that this is because "there is no hard barrier between the parent and forked processes" in Unix. – Harry Johnston Dec 14 '11 at 02:55
  • When UNIX was first designed, there were no hard barriers between processes because few of the processors available at the time supported hardware isolated processes. Windows NT+, on the other hand, was specifically designed to REQUIRE hardware isolation between processes. Thus, in NT-based OS', when you fork() a child process, the child process must be initialized to be identical to the parent. It can take a considerable amount of time to duplicate a running process into a new child process, which is why CreateProcess() doesn't. – Rich Turner Dec 14 '11 at 16:05
  • By "hardware isolation" I can only assume you mean separate address spaces. But this doesn't prevent the kernel from using copy-on-write to quickly implement fork(). I'm pretty sure that modern Unix variants (such as Linux) give child processes a separate address space - it would be a pretty sad thing if they didn't. – Harry Johnston Dec 14 '11 at 18:53
  • Yep. But note that UNIX doesn't REQUIRE hardware isolated processes which is why UNIX can run on hardware that doesn't support hardware isolated processes. NT *REQUIRES* HW isolated processes which is what prevented it from running on older/lower-spec processors in the past. Modern *N*X OS' & runtimes (including SFU & Cygwin) do implement all manner of schemes to improve perf of fork(), but it is an inherently costly operation: fork() is slow. Period. Avoid it unless you need it. – Rich Turner Dec 20 '11 at 01:14
3

How about this

Most of the Unix API is implemented by the POSIX.DLL dynamically loaded (shared) library. Programs linked with POSIX.DLL run under the Win32 subsystem instead of the POSIX subsystem, so programs can freely intermix Unix and Win32 library calls.

From http://en.wikipedia.org/wiki/UWIN

The UWIN environment may be what you're looking for, but note that it is hosted at research.att.com, while UWIN is distributed under a liberal license it is not the GNU license. Also, as it is research for att, and only 2ndarily something that they are distributing for use, there are a lot of issues with documentation.

See more info see my write-up as the last answer for Regarding 'for' loop in KornShell

Hmm main UWIN link is bad link in that post, try

http://www2.research.att.com/sw/download/

Also, You can look at

To get a sense of the features vs issues.

I hope this helps.

Community
  • 1
  • 1
shellter
  • 36,525
  • 7
  • 83
  • 90
  • So it's a dll way, which the OP tried to avoid. – saulius2 Feb 25 '23 at 19:02
  • 1
    @saulius2 : Yep, can't deny it. Wasn't sure if the OP was aware of that library. AND now, I realize as UWIN has been kaput for years, is this answer worth keeping. Somebody must have the src code some place, but this was from Windows 7-8 days. As time allows, I'll update the links that are still disvoerable. (I'm looking at you, archive.org (-;!) . Good luck in your project! – shellter Feb 25 '23 at 19:28
  • For me, who is interested in the detailed history of computing, it sure is worth keeping. Thanks. :) – saulius2 Feb 26 '23 at 09:08
2

The question really boils down to: Can the Win32 API (as of MSVC20??) be efficiently used to provide a complete POSIX layer over the Windows API?

Short answer: No.

"Complete POSIX" means fork(), mmap(), signal() and such, and these are [almost] impossible to implement on NT.

To drive the point home: GNU Hurd has problems with fork() as well, because Hurd kernel is not POSIX.

NT is not POSIX too.

Another difference is persisence:

  • In POSIX-compliant systems it is possible to create system objects and leave them there. Examples of such objects are named pipes and shared memory objects (shms). You can create a named pipe or a shm, and leave it in the filesystem (or in a special filesystem-like place) where other processes will be able to access it. The downside is that a process might die and fail to clean up after itself, leaving unused objects behind (you know about zombie processes? same thing).

  • In NT every object is reference-counted, and is destroyed as soon as its last handle is closed. Files are among the few objects that persist.

Symlinks are a filesystem feature, and don't exactly depend on NT kernel, but current implementation (in Vista and later) is incapable of creating object-type-agnostic symlinks. That is, a symlink is either a file or a directory, and must link to either a file or a directory. If the target has wrong type, the symlink won't work. You can give it the right type if the target exists when you create the symlink, but POSIX requires that symlinks may be created without their target existing. I can't imagine a use-case for a symlink that points first to a file, then to a directory, but POSIX says that this should work, and if it doesn't, you're not completely POSIX-compliant. Or if your symlinking API/utility can be given an option that specifies the right type, when target doesn't exist, that also breaks POSIX compatibility.

It is possible to replicate some POSIX features to some degree (such as "integer descriptors from in a single namespace, referencing any I/O object, and being select()able" without sacrificing [much] performance, but that is still a major undertaking, and POSIX interface is really restrictive (that is, if you could just add one more argument to that function, it would have been possible to Do The Right Thing...but you couldn't, unless you want to throw POSIX compliance away).

Your best bet is to not to rely on POSIX features that are difficult to port to non-POSIX systems, or abstract in such a way that lower levels may have separate implementations for different OSes, and upper levels do not care about the details.

LRN
  • 1,803
  • 15
  • 14