5

I read C++ Streams vs. C-style IO? (amongst other pages) to try to help me decide which way to implement some file IO in a project I'm working on.

Background I'm fairly new to C++ and Windows programming, I've traditionally worked in C and command line applications. Apologies ahead of time for the n00b-ness of this question.

The problem I want to read one text file, process the contents and output to another (new) text file. I am working in a Win32 environment (and this won't change for the forseeable future) and am writing the application to be Unicode aware, through _T style macros. The "processing" could include inserting/appending/deleting the lines of text, which will be at most 128 characters.

The question I would prefer to write something that is going to be robust, so I/O error handling is a consideration. I think that I need to stay away from C style file I/O if for no other reason than to simplify the code and type checking -- ie approach this in a more OO POV. What are the advantages of using Win32 API functions over the C++ stream functions (if any)? Can you recommend a good primer for either approach? (My googling has left me with a little information overload)

Thanks muchly

Community
  • 1
  • 1
Stephen
  • 1,607
  • 2
  • 18
  • 40
  • I should have mentioned that while the line length will be constrained (and small), the file length will be widely variable, but generally fairly large. – Stephen Jun 02 '11 at 18:41
  • Can you give some estimate of what magnitude "fairly large" means? e.g. tens of megabytes, hundreds of megabytes, gigabytes, tens of gigabytes, ... tens of terabytes? – Ben Voigt Jun 02 '11 at 21:15
  • @Ben Yes, "fairly large" is entirely ambiguous, sorry, 10s of Mb, but must be processed in a real time environment so yes, performance is a consideration. – Stephen Jun 06 '11 at 12:24

4 Answers4

7

What are the advantages of using Win32 API functions over the C++ stream functions (if any)?

  1. Speed
  2. Ability to use overlapped I/O to handle multiple operations at once without threads (and the complexity of synchronization)
  3. Speed
  4. More specific error codes
  5. Speed
  6. Speed
  7. Low dependency footprint (compared to MSVC++ 7.x, 8.0, 9.0, 10.0 and probably most other vendors)
  8. Speed
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • 6
    And the downside is 'no chance of moving the code to anything other than Windows'. It is not clear whether that matters to the OP. – Jonathan Leffler Jun 02 '11 at 18:28
  • 2
    @Jonathan: Since he didn't ask about the downsides on the Win32 API, I'm hoping the portability problems are obvious to @Stephen and he's already determined its not a overriding factor. – Ben Voigt Jun 02 '11 at 18:30
  • Not to mention lack of type safety. –  Jun 02 '11 at 18:31
  • 3
    And I don't see how windows.h can possibly be described as "Low dependency footprint". –  Jun 02 '11 at 18:32
  • Dependency footprint would be dependent upon static vs dynamic linking. – sean e Jun 02 '11 at 18:33
  • People often forget about portability issues until they are mentioned. There is a target audience for whom the 'world outside of Windows' does not exist. There is another for whom that outside world is important. – Jonathan Leffler Jun 02 '11 at 18:34
  • 4
    @sean: Only in degree. If you use the C or C++ standard libraries, you have to deploy them somehow. @Neil: Code using the C or C++ standard library has an indirect dependency on everything windows.h related, in addition to a dependency on the C and C++ libraries themselves. Also, there's no need to redistribute anything found in `windows.h`, the corresponding DLLs are already on the system. Whereas the C and/or C++ libraries have to be included with the application. This can increase an installer by orders of magnitude. – Ben Voigt Jun 02 '11 at 18:38
  • @Jonathan Portability is not a consideration, other software components outside my control lock us into Windows – Stephen Jun 02 '11 at 18:39
  • @Ben That simply isn't true - I use C++ I/O in all my FOSS projects and I don't have to distribute any extra libraries. –  Jun 02 '11 at 18:41
  • 2
    @Neil: No, your users are responsible for finding and installing the dependencies. That's (1) more of a burden and (2) completely unexpected by the majority of Windows users. – Ben Voigt Jun 02 '11 at 18:42
  • @Ben Bzzzt! Wrong. My FOSS stuff is dependent only on the MS runtimes, just as stuff that used the Windows API directly would be. –  Jun 02 '11 at 18:44
  • @Neil: Is it a recent change that MS runtimes no longer need to be distributed? You've never had to deal with for example msvcrt.dll not being installed on an end user's machine? – sean e Jun 02 '11 at 18:47
  • @Neil: see http://www.microsoft.com/downloads/en/details.aspx?FamilyID=a7b7a05e-6de6-4d3a-a423-37bf0912db84&displaylang=en and http://www.microsoft.com/downloads/en/details.aspx?familyid=9B2DA534-3E03-4391-8A4D-074B9F2BC1BF&displaylang=en – sean e Jun 02 '11 at 18:49
  • 1
    @sean I've never come across a case where msvcrt.dll was not installed, no. And if it isn't, a lot more applications than mine would break. –  Jun 02 '11 at 18:50
  • @Neil: Your apps run because someone else took care of your dependencies. – sean e Jun 02 '11 at 18:51
  • @Sean My apps run on a new install of all versions of Windows since W2K (at least). I don't write them using Visual C++. –  Jun 02 '11 at 18:54
  • @Neil: Thanks for the clarification. You earlier wrote "My FOSS stuff is dependent only on the MS runtimes" - what compiler is using the MS runtimes? Generally MS runtime means the CRT or Standard C++ libraries that come with Visual C++ which aren't used/distributed by other compilers. – sean e Jun 02 '11 at 19:01
  • @sean The C++ standard libraries are mostly header-based templates, so there is no library. My FOSS stuff that uses the C++ Standard Library is only dependent at runtime on msvcrt.dll (and of course the windows system DLLs), which has been part of the standard Windows installation for years. –  Jun 02 '11 at 19:05
  • 1
    @Neil: Visual C++ does have a dll for the standard library. Visual C++ does not use msvcrt.dll. It has versioned dlls msvcrXX.dll, msvcpXX.dll (where XX is 80, 90, 100). This is why both Ben and I argue that Win32 has fewer dependencies than C++ standard streams - strictly from a VC++ viewpoint. VC++ dependencies: http://msdn.microsoft.com/en-us/library/8kche8ah.aspx – sean e Jun 02 '11 at 19:12
  • @Sean Oh, and as to which compiler - GCC. And Ben never mentioned VC++. –  Jun 02 '11 at 19:12
  • @Neil: True. I assumed he was experienced with streams as implemented in VC++ from his noting speed improvements by using Win32 API instead. – sean e Jun 02 '11 at 19:21
  • @Sean: Both gcc and MSVC implementations of iostreams are miserably slow. @Neil: You must have had to jump through some hoops to get gcc using the Windows version of msvcrt.dll as its runtime, IIRC the only compiler that does that by default was VC6 (and maybe VC5?). Last I checked, gcc had its own runtime (glibc on linux, newlib on Windows). Ditto any of the other C++ compilers (Borland, Comeau, Intel) -- most all have a runtime requiring redistribution (even in DLL form or making your application executable larger). – Ben Voigt Jun 02 '11 at 20:00
  • @Ben No, no hoops. The top-level dependencies for http://code.google.com/p/csvfix (for example) are KERNEL32.DLL, MSVCRT.DLL and ODBC32.DLL, as reported by Dependency Walker. Have you ever actually used MinGW GCC? –  Jun 02 '11 at 20:05
  • 1
    @Neil: In any case, coding against the Win32 API has the minimum dependency footprint (zero redistribution). You've apparently matched it, but not gone lower. And I don't think it's erroneous to say "low dependency footprint" (I never said "lower", at which point the "than what" would become important) – Ben Voigt Jun 02 '11 at 20:05
  • @Neil: No, I've used cygwin gcc quite extensively, along with linux gcc, and many different versions of msvc, but not mingw (or `gcc -mno-cygwin`). Wrote an application last week to log data from a serial port, adding timestamps to each record. VC++ 2010, 4kB exe file, only direct dependencies are kernel32, user32, shell32. – Ben Voigt Jun 02 '11 at 20:09
  • @Ben Cygwin GCC is crap and has god knows how many dependencies - I wouldn't touch it (or cygwin) with a bargepole. I'm talking about MinGW GCC - best distribution available at http://tdm-gcc.tdragon.net, which is only dependent on msvcrt.dll, and then only if you use the C runtime (as opposed to pure Win32 stuff). –  Jun 02 '11 at 20:15
  • on msvcrt.dll: http://stackoverflow.com/questions/1073509/should-i-redistribute-msvcrt-dll-with-my-application and http://msdn.microsoft.com/en-us/library/abx4dbyh%28VS.80%29.aspx – sean e Jun 02 '11 at 22:30
  • 1
    @sean msvcrt.dll has been shipped with all recent (i.e. last 10 years or so) versions of windows - you can either like it or lump it - I don't care. This conversation is now closed from my side. –  Jun 02 '11 at 22:45
  • 1
    @Sean: The problem here is that those questions use "MSVCRT.DLL" as a generic term for an entire family of DLLs. @Neil isn't using it that way, he's talking about the single file named exactly `MSVCRT.DLL` which ships with Windows. – Ben Voigt Jun 02 '11 at 23:10
  • 2
    The point behind the comment is that apps should not be linking against msvcrt.dll (specifically msvcrt.dll rather than the family of crt dlls). Per the documentation "it is a system component owned and built by Windows. It is intended for future use only by system-level components." It works now. May not in the future. That's all. – sean e Jun 03 '11 at 00:40
  • @sean: Do you really think MS is going to remove that dll while there are still zillions of applications around that depend on it? It used to be a system component in XP and so removing it would mean to break XP compatibility. I guess MS is giving this statement only to make developers think: *"if we have to ship one of these in either case we can as well use the one that fits to our MSVC version"* instead of *"Let's try to hack it so that it links against the old one that is installed in Windows already so we can avoid having to ship a CRT redistributable"*. – x4u Jun 04 '11 at 00:02
  • 1
    @x4u: I'm simply relaying the current guidance. – sean e Jun 04 '11 at 01:32
  • 1
    A few more noteworthy features of the Win32 I/O implementation: [ReadFileScatter](https://msdn.microsoft.com/en-us/library/windows/desktop/aa365469.aspx) to read multiple chunks in a single call (although that is covered by *"Speed"* already), and having a lot more control over file creation (e.g. passing `FILE_FLAG_DELETE_ON_CLOSE` to create a file that's automatically disposed off once all handles to it are closed; ideal for temporary files that get cleaned up even in case of a crash). – IInspectable Jan 25 '17 at 15:23
5

Use C++ stream I/O. Writing to text files is hardly going to stress the I/O library, and you gain enormous benefits in clarity of code, type safety, and the fact that you hardly have to write anything to get the job done. As a side effect, your code will probably be more portable and more understandable, so if you have to ask about it here, you will get more good answers.

  • 1
    "Hardly going to stress the I/O library"... sometimes true, sometimes not, really depends on how much I/O there is. For example, I have to daily use an application which uses C++ iostreams for text-mode output, it takes > 4 hours to write 2GB of data out. My Win32-based code can read and parse that data file in about 2 minutes. – Ben Voigt Jun 02 '11 at 18:41
  • 2
    @Ben Has it occurred to you that the person who wrote that application did not know what he was doing? –  Jun 02 '11 at 18:46
  • @BenVoigt : Sounds like the `std::endl` fiasco in effect. ;-] – ildjarn Jun 02 '11 at 18:59
  • 4 hours to write 2 GB == OUCH. I would have re-written the damn thing in Ruby my first week there. – John Dibling Jun 02 '11 at 19:05
  • @Neil: It's occurred to me that the developer benefits from advertising export capability, while making it painful to export the data and use another analysis package. I guess I should use ProcMon and check whether there are a lot of tiny writes (which `endl` flushing would cause). But really, the overhead of iostreams explains the issue, my program for reading the data was very slow with iostreams, then I got a 30x speedup by using Win32 instead. – Ben Voigt Jun 02 '11 at 19:06
  • @John: I so would rewrite it, but the format it's exporting from isn't documented (that I know of). [adInstruments' LabChart](http://www.adinstruments.com/products/software/research/LabChart-Software/) is the offender. – Ben Voigt Jun 02 '11 at 19:07
  • @ildjarn: Ok, I checked with Process Monitor, and all the writes are 4k. Also, it's CPU-bound, and the `std::endl` fiasco I believe causes a process to be I/O bound with very low CPU usage. – Ben Voigt Jun 03 '11 at 21:06
  • A profiler reveals that they are doing something pretty stupid: buffering stuff into a `ostringstream` and then from there into `ofstream`. Sigh. At least I found out that it exports very quickly into a documented binary format, although that format doesn't include all the data I need, it does speed up my data export process immensely. – Ben Voigt Jun 06 '11 at 19:37
  • @Ben So, are you going to delete your accepted answer? Only joking! –  Jun 06 '11 at 19:45
  • @Neil: That particular craziness on the part of the developer explains about a 3x slowdown. It doesn't explain why the export is 100x slower than it ought to be. (I didn't calculate an exact fraction based on the profiler data, but I did look at the approximate relative cost of each function. The `codecvt` unicode->sbcs nonsense accounts for less than 65% of CPU time.) That may be a lesson on static vs dynamic linking of the runtime libraries: if you use the redist DLL, another developer can profile time spent in the standard library. – Ben Voigt Jun 06 '11 at 22:21
2

To take a broader look, direct use of Win32 is good if you need a tiny application with no additional dependencies.

For anything that C++ iostreams does better, you probably want to look at Boost::Spirit. Seems like it has all the type-safety of iostreams, with much better performance.

You really have two problems here: File I/O, and Text Processing. Win32 does the first exceptionally well, and provides no help with the second. Boost::Spirit does the second very well. C++ iostreams are marginal at both tasks, avoid them unless portability is the most important feature.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • 2
    @Ben I don't think you understand either Spirit or the C++ iostreams. –  Jun 02 '11 at 21:52
  • @Neil: It's possible I'm confusing Spirit with one of the other Boost libraries, such as Qi. I do know that I've seen benchmarks of one of the Boost libraries (not `lexical_cast`!) getting a higher throughput on text parsing than e.g. `strtod`. – Ben Voigt Jun 02 '11 at 21:55
  • And [here](http://stackoverflow.com/questions/5678932/fastest-way-to-read-numerical-values-from-text-file-in-c-double-in-this-case/5679966#5679966)'s the reference in question. Seems I was correct to say Boost::Spirit, as Qi is a subset of that library. – Ben Voigt Jun 02 '11 at 21:59
  • @Ben Eh? lexical_cast isn't part of either iostreams or Spirit, and it doesn't do parsing. You seem to be conflating converting with parsing. –  Jun 02 '11 at 22:01
  • @Neil: I said it is part of `Boost`, am I wrong? And `string s = "-1.45e67"; double d = lexical_cast(s);` is text parsing, isn't it? But my mention of `lexical_cast` was only to say that not all Boost-provided facilities have high performance. – Ben Voigt Jun 02 '11 at 22:04
  • @Ben Performance is going to outweigh clarity, etc, in this project so I accepted this answer simply due to the lack of comment-war on it. – Stephen Jun 06 '11 at 12:31
0

Just to provide a rough benchmark - this code, which must be about the most inefficient possible:

#include <iostream>
using namespace std;

unsigned int MB = 1024 * 1024;
unsigned int GB = MB * 1024;

int main() {
    char c = 'x';
    for ( unsigned int i = 0; i < GB; i++ ) {
        cout << c;
    }
}

Took about 4 minutes to write a gig of data to a text file when invoked as:

myprog > file.txt

on my hardly state-of-the-art laptop.

  • I assume this is with mingw? If you have time, would you submit an answer to [my question comparing number->string conversion performance of iostream, sprintf, and tuned code](http://stackoverflow.com/questions/4351371/c-performance-challenge-integer-to-stdstring-conversion)? I'd love to see how mingw fares. I'd suggest that the most important cases are the iostreams and sprintf implementations in the question and the accepted answer. – Ben Voigt Jun 04 '11 at 01:51
  • BTW the disk I/O time to write a gigabyte is no more than 12 seconds (assuming your disk fragmentation level isn't off the charts)... so in this case iostreams are slowing everything down by a factor of 20... and there's not even any formatting going on yet. – Ben Voigt Jun 06 '11 at 22:24