5

I do some numerical calculation in Java, C# and C++. Some of them save a lot of data (to the text file). What is the fastest way to do it?

C++.

ofstream file;
file.open(plik);
for(int i=0;i<251;i++){
    for(int j=0;j<81;j++)
        file<<(i-100)*0.01<<" "<<(j-40)*0.01<<" "<<U[i][j]<<endl;
    file<<endl;
}

Which I assume is very fast ( am I right?:) )

Java

void SaveOutput(double[][] U, String fileName) throws IOException
{
    PrintWriter tx = new PrintWriter(new FileWriter(fileName));
    for(int i=0;i<251;i++)
    {
        for(int j=0;j<81;j++)
        {
            tx.println(String.format("%e %e %e ",(i - 100) * dz, (j - 40) * dz, U[i][j]));
        }
        tx.println();
    }
    tx.close();
}

The C# example is similar.

and here is what bothers me. I make a String object for each line (a lot of garbage). In this example it is not that much but sometimes I have 10 000 000 lines. This leads me to questions:

  1. Can the c++ example be faster?
  2. Should I use StringBuilder for Java or maybe it also bad due to number of lines
  3. There is any other way or library?
  4. What about C#?

Thank you

Diego Sevilla
  • 28,636
  • 4
  • 59
  • 87
Lukasz Madon
  • 14,664
  • 14
  • 64
  • 108

8 Answers8

5

Profile it. Run the code, time it, see how long it takes. If the amount of time it takes is acceptable, use it. If not, figure out what piece is taking a long amount of time to run, and optimize it.

  • Make it right.
  • Make it fast.

That order. (Some people add "make it run/build" before those two...)

That said, I've actually run metrics on this sort of thing before. The short of it: You're waiting for disk, and disk is ungodly slow. It doesn't matter if you're writing in C or C++, or Java, they're all waiting for the hard disk.

Here's a previous post that I did on various I/O methods in C. Not exactly what you're looking for, but might be informative.

Community
  • 1
  • 1
Thanatos
  • 42,585
  • 14
  • 91
  • 146
  • 1
    Profiling is hard for I/O bound programs. – Diego Sevilla Nov 04 '10 at 23:13
  • it is always acceptable cuz I can run it over the night. I'm looking for some hint or pattern like when I started programming in Java I used something like tx.println(val1 + " " + val2 + " " + etc . Which was a bit stupid and did make my calculations slower. – Lukasz Madon Nov 04 '10 at 23:22
  • @lukas Did it really? Do you have benchmarks of a + b + c being slower than a StringBuilder (for some sane-length a, b, and c) on a modern javac build? –  Nov 04 '10 at 23:31
  • 1
    The real problem with profiling a I/O bound application is that the main part of the time is the write operation to disk. The processor is IDLE most of the time, so what to profile? – Diego Sevilla Nov 04 '10 at 23:34
  • It wasn't just 3 values but more (dont remeber like 20) and what it does it creates a new StringBuilder object each loop. link: http://chaoticjava.com/posts/stringbuilder-vs-string/ BTW Did you know that 997(number of my points) is the highest 3 digits prime number?:p – Lukasz Madon Nov 04 '10 at 23:40
  • @Diego but there are some problem with Strings if someone for example does it know that Strings are immutable. http://www.yoda.arachsys.com/csharp/stringbuilder.html – Lukasz Madon Nov 04 '10 at 23:42
4

One word: Profile.

Do note, that inserting std::endl to buffered (file) stream causes it to flush, which will probably degrade performance (from the language POV it means that the buffer is written "out", although this might not necessarily mean a physical disk access). For simply printing newline, use '\n' - it's never worse.

eq-
  • 9,986
  • 36
  • 38
2

First, and foremost: use a buffered writer!

This may include enabling buffering on the channel in some languages or using a BufferedWriter (in Java) or equivalent in others. Failure to do so may lead to far inferior performance as the output stream may be "over-flushed" -- the sample code above is in violation of this (FileWriter knows nothing of buffering)!

In many cases one can consider CPU and main memory access "cheap" and IO "expensive" -- in such trivial cases like this, improving the access to the IO itself (e.g. buffering and not [over] flushing) will result in the most tangible gains. Modern VMs and JITs do what they do quite well and short-lived objects allocation/de-allocation is likely the least of the "worries" here.

1

Use Java.nio class to create channels instead. Channels are new to java and are much faster then the old streams. You should also buffer the write. I can't remember if channels buffer by default. I need to read some to tell you that.

Finally, it's ok you are creating a lot of string. You are throwing them away instantly. I doubt it will make your write to disk slow. Disk IO is much slower than CPU.

Here is what I was thinking:

fileChannel = new FileOutputStream("test.txt").getChannel();
for(int i=0;i<251;i++) {
  for(int j=0;j<81;j++) {
    fileChannel.write(ByteBuffer.wrap((String.format("%e %e %e ",(i - 100) * dz, (j - 40) * dz, U[i][j]) + "\n").toBytes());
  }
fileChannel.close();
Amir Raminfar
  • 33,777
  • 7
  • 93
  • 123
  • oops forgot to add toBytes() :) – Amir Raminfar Nov 04 '10 at 23:29
  • 1
    AFAIK there really isn't a reason to use NIO unless one needs selectors (non-blocking IO), mmap, or other "extra" features. The "old" Java IO API is just as fast for a blocking read/write operation and mutli-threaded IO isn't (always) slower than selector/nb IO. –  Nov 04 '10 at 23:50
  • Thatn's not true always. Depending on the operating system, its possible for it to be faster using nio. http://stackoverflow.com/questions/1605332/java-nio-filechannel-versus-fileoutputstream-performance-usefulness – Amir Raminfar Nov 05 '10 at 01:18
1

Note first, that this I/O bound program is going to get not much improvement depending the small detail (for example if you use C++ streams or printf).

For the C/C++ part, some say using ol' printf operations is faster. It may be faster, but not that orders of magnitude, so I wouldn't bother.

As for the Java version, I think it is already quite optimized.

Can't tell for C#, my doctor doesn't allow me :)

Diego Sevilla
  • 28,636
  • 4
  • 59
  • 87
0

I expect it would be faster to use fprintf in C or C++.

  • @JimR, that was my thinking. The simple answer though, is try it. –  Nov 05 '10 at 00:00
  • I have tried it. I built a report generator and the PM insisted that we use "proper c++ features" C++ streams were significantly slower in that case. This was somewhere around 2002 so things may have changed, but... – JimR Nov 05 '10 at 00:48
  • Every `<<` is an overload, where the type is checked by the compiler. Every `%d` is parsed at runtime, and any error is Undefined Behavior. I'd expect `fprintf` to compile faster, but `operator<<(ostream&, int)` to be faster. Also, it's simple enough to be inlined. `fprintf` probably is 2KB of code. – MSalters Nov 05 '10 at 11:05
0

Lukas,

First off, I know mainly C#, so everything here pertains to .NET.

With the number of lines you are dealing with, I wouldn't create Strings or use a StringBuilder. A StringBuilder only helps with the creation of Strings from a number of smaller segments.

I think your best bet would be to use the Stream versions of the file system objects. That way, you aren't storing strings at all, and so your memory usage should be fairly small.

Also, if you are really short on memory, you could always create an unmanaged string and P/Invoke into it.

Erick

Erick T
  • 7,009
  • 9
  • 50
  • 85
0

As for Java, you don't have to create all those strings. Get rid of String.format and write the bytes directly.

Use nio and profile mercilessly

OscarRyz
  • 196,001
  • 113
  • 385
  • 569