2
FILE *fp;
fp = fopen(pch, "wb");
while(1)
fwrite(p, sizeof(char), strlen(p) / sizeof(char), fp);

I used these codes to write to txt file, however when the output file is very big (It's size may grow up to 5GB) it will be very slow, I need to wait a long time. Can anyone tell me a better way to write to txt file?

EDIT: p is a const char * variable. I may need to wait one hour on my computer. I just check that the size of txt file grew up to 20GB.

while (!done)    
    {
        const char *p = icmDisassemble(processor[i], currentPC);
        fwrite(p, sizeof(char), strlen(p) / sizeof(char), fp);
        fwrite("\n", sizeof(char), 1, fp);
        done = (icmSimulate(processor[i], 1) != ICM_SR_SCHED);
    }
mazhar islam
  • 5,561
  • 3
  • 20
  • 41
bios
  • 169
  • 9
  • 7
    `need to wait a long time` => Seconds? Minutes? Hours? SSD ? HDD? USB? Network? What is `p` ? – Thomas Ayoub Jun 12 '15 at 12:26
  • 8
    Note that `sizeof(char)` is redundant (it's 1 by definition) - so change: `fwrite(p, sizeof(char), strlen(p) / sizeof(char), fp);` to`fwrite(p, 1, strlen(p), fp);` – Paul R Jun 12 '15 at 12:27
  • 2
    Which platform? Where are you writing? – LPs Jun 12 '15 at 12:28
  • 2
    Which file system is your system using? – Peque Jun 12 '15 at 12:29
  • 6
    What is `p`? In other words, how many bytes to you write per `fwrite()` call? – unwind Jun 12 '15 at 12:29
  • 2
    Not sure whether it is similar enough to be a duplicate, but this seems to be very relevant: http://stackoverflow.com/q/4350785/3488231 – user12205 Jun 12 '15 at 12:43
  • 1
    Could you try to strore your values in a tmp structrure and wrote the whole data each time it reaches a specific size: eg. 10MB? – LPs Jun 12 '15 at 12:43
  • is p = icmD..() malloc()ing memory? if so, this may lead to constantly slower memory-allocation – Peter Miehle Jun 12 '15 at 12:46
  • Put the I/O in a separate thread so that you can proceed with disassembly while queuing up async I/O. – Paul R Jun 12 '15 at 12:46
  • 2
    Comment out the two `fwrite`s and see if it's faster. Maybe the bottleneck is not within `fwrite`. – Jabberwocky Jun 12 '15 at 12:47
  • Thanks for your answer, I am sure "const char *p = icmDisassemble(processor[i], currentPC);" is not bottleneck – bios Jun 12 '15 at 12:53
  • You need to say how long this is really taking to avoid a wild goose chase on this. Writing gigabytes (or 10's of gigabytes) of data takes a "long" time. How long is your process taking? And as others have said, what file system are you writing to (local, remote, USB, etc)? – lurker Jun 12 '15 at 12:58
  • Thanks LPs. I am trying to avoid write such big amount data into txt file.but this question is still interesting, isn't it – bios Jun 12 '15 at 12:59
  • @user2842153 - " I am sure "const char *p = icmDisassemble(processor[i], currentPC);" is not bottleneck"? So you actually profiled your code and can post the results? if you didn't profile your code - you **can't** be sure. – Andrew Henle Jun 12 '15 at 13:03
  • 1
    @user2842153 Probably, due my bad english, I hadn't expained my point well. What I meant is that you can sotre in RAM all data as far as the amount of data reach a certain value: eg. 10 MB. Then you can write the whole 10MB to disk through fwrite. – LPs Jun 12 '15 at 13:12
  • @LPs: don't worry, it's not you. OP generates one string at a time and writes that immediately. Your solution would be about the first thing I'd try - it does not need to be as large as 10Mb, even one disk page (typically 4Kb) should already do much better (but of course larger = better here). – Jongware Jun 12 '15 at 13:17
  • 1
    @Jongware - OP is already using `fwrite()`, which is most probably buffered at 8kB., although actual `write()` calls are likely somewhat smaller because of the implementation. – Andrew Henle Jun 12 '15 at 13:38
  • @AndrewHenle: that's a good point. Is there any chance that tailored buffering outperforms the built-in one? If not, then almost all of the above commenters are right and it's *not* the `fwrite` that throttles output speed. – Jongware Jun 12 '15 at 13:43
  • What operating system, what `libc`, what file system ?? Please edit your question to improve it. – Basile Starynkevitch Jun 12 '15 at 18:36

1 Answers1

3

I have tested a null implementation of your code to see what is fwrite performance and I believe the bottleneck definitely isn't fwrite.

#include <stdio.h>
#include <string.h>

char *icmDisassemble(int x, int y) {
        return "The rain in Spain falls mostly on the plain";
        // return "D'oh";
        // return "Quel ramo del lago di Como, che volge a mezzogiorno, tra due catene non interrotte di monti, tutto a seni e a golfi, a seconda dello sporgere e del rientrare di quelli, vien, quasi a un tratto, a ristringersi, e a prender corso e figura di fiume, tra un promontorio a destra, e un’ampia costiera dall’altra parte; e il ponte, che ivi congiunge le due rive, par che renda ancor più sensibile all’occhio questa trasformazione, e segni il punto in cui il lago cessa, e l’Adda rincomincia, per ripigliar poi nome di lago dove le rive, allontanandosi di nuovo, lascian l’acqua distendersi e rallentarsi in nuovi golfi e in nuovi seni.";
}

#define ICM_SR_SCHED 0

int icmSimulate(int x, int y) {
        return ICM_SR_SCHED;
}

int main() {
    int done;
    int i = 0;
    int processor[1] = { 0 };
    int currentPC = 0;

    FILE *fp;

    fp = fopen("test.dat", "w");

    while (!done)
    {
        const char *p = icmDisassemble(processor[i], currentPC);
        fwrite(p, sizeof(char), strlen(p) / sizeof(char), fp);
        fwrite("\n", sizeof(char), 1, fp);
        done = (icmSimulate(processor[i], 1) != ICM_SR_SCHED);
    }
}

Test results

touch test.dat; ( ./testx & ); for i in $( seq 1 7 ); do \
    date | tr "\n" "\t"; du -sh test.dat; \
    sleep 10; done; \
killall -KILL testx; rm -f test.dat

Write speed is between 50 and 60 MB/s, or 2 Gb per minute on a desktop SATA disk (not SSD). This is about fifty per cent slower, or the same order of magnitude, than dd:

time dd if=/dev/zero of=test.dat bs=1M count=5300
5300+0 records in
5300+0 records out
5557452800 bytes (5.6 GB) copied, 61.5375 s, 90.3 MB/s

real    1m2.105s
user    0m0.000s
sys     0m9.544s

My hardware clocks at around 100 MB/s sustained, so 90.3 MB/s is a believable figure (I am using that system now, and possibly slowing it down a bit).

Changing the string length does not change significantly the times:

// "D'oh"

Fri Jun 12 19:36:50 CEST 2015   1.5M    test.dat
Fri Jun 12 19:37:00 CEST 2015   751M    test.dat
Fri Jun 12 19:37:10 CEST 2015   1.5G    test.dat
Fri Jun 12 19:37:20 CEST 2015   2.2G    test.dat
Fri Jun 12 19:37:31 CEST 2015   2.9G    test.dat
Fri Jun 12 19:37:41 CEST 2015   3.6G    test.dat
Fri Jun 12 19:37:51 CEST 2015   4.4G    test.dat

// First lengthy sentence of *I Promessi Sposi*

Fri Jun 12 19:39:42 CEST 2015   8.4M    test.dat
Fri Jun 12 19:39:52 CEST 2015   1.2G    test.dat
Fri Jun 12 19:40:02 CEST 2015   2.1G    test.dat
Fri Jun 12 19:40:14 CEST 2015   3.1G    test.dat
Fri Jun 12 19:40:25 CEST 2015   4.0G    test.dat
Fri Jun 12 19:40:35 CEST 2015   4.8G    test.dat
Fri Jun 12 19:40:45 CEST 2015   5.7G    test.dat

// "The rain in Spain"

Fri Jun 12 19:41:21 CEST 2015   7.3M    test.dat
Fri Jun 12 19:41:31 CEST 2015   1.2G    test.dat
Fri Jun 12 19:41:43 CEST 2015   2.1G    test.dat
Fri Jun 12 19:41:53 CEST 2015   3.0G    test.dat
Fri Jun 12 19:42:03 CEST 2015   3.9G    test.dat
Fri Jun 12 19:42:13 CEST 2015   4.6G    test.dat
Fri Jun 12 19:42:23 CEST 2015   5.3G    test.dat

So where is the bottleneck?

I really see few options.

  • it is on the disk. Try running my code on your computer for a couple of minutes. It should write around ten gigabytes' worth of rubbish. Significantly lower figures might indicate something wrong in the disk setup, the filesystem, or even the physical support or the interface hardware.

  • it is icmDisassemble. You say it is not, but let's suppose that it returns a zero length string very often. By returning "", I get much worse performances: 1.5 Gb/minute instead of 4-5.

In this latter case you can try and count the line lengths you get:

tr -c "\n" "." < YourLargeOutputFile | sort | uniq -c

This is the results for strings on a random file, showing how most lines are only four bytes long (as expected):

  10931 ....
   4319 .....
   1680 ......
    629 .......
    288 ........
    142 .........
     54 ..........
     21 ...........
     18 ............
      6 .............
      3 ..............
      4 ...............
      3 ................
      1 .................

If you see a very large number of zero-length lines, that could be one thing to do:

  const char *p = icmDisassemble(processor[i], currentPC);
  // Ignore zero-length output.
  if (p[0]) {
    fwrite(p, sizeof(char), strlen(p) / sizeof(char), fp);
    fwrite("\n", sizeof(char), 1, fp);
  }

Another possibility is trying to use a larger buffer. With a 64K buffer, which should be more than enough, I get again normal performances even writing zero length strings as well as carriage returns:

Fri Jun 12 20:03:15 CEST 2015   6.5M    test.dat
Fri Jun 12 20:03:25 CEST 2015   1.3G    test.dat
Fri Jun 12 20:03:35 CEST 2015   2.1G    test.dat
Fri Jun 12 20:03:45 CEST 2015   3.0G    test.dat
Fri Jun 12 20:03:56 CEST 2015   3.7G    test.dat
Fri Jun 12 20:04:06 CEST 2015   4.3G    test.dat
Fri Jun 12 20:04:17 CEST 2015   5.2G    test.dat

This is the modified code (note that the buffer is not zero terminated - the "\n" overwrites the terminating zero).

#define ICM_BUF_LEN 0x10000
char *buffer = malloc(ICM_BUF_LEN);
size_t bufptr = 0;

while (!done)
{
    const char *p = icmDisassemble(processor[i], currentPC);
    if ((strlen(p) + bufptr + 1) >= ICM_BUF_LEN) {
            fwrite(buffer, 1, bufptr, fp);
            bufptr = 0;
    }
    strcpy(buffer + bufptr, p);
    bufptr += strlen(p);
    buffer[bufptr++] = '\n';
    done = (icmSimulate(processor[i], 1) != ICM_SR_SCHED);
}
fwrite(buffer, 1, bufptr, fp);
free(buffer); buffer = NULL;

Saving strlen calls (saving the first strlen into a variable and using memcpy) does not change appreciably the results. A buffer twice as large, on my system, also fails to deliver any benefit.

  • it is icmDisassemble. Not always, mind you, but sometimes. Perhaps on very rare cases it finds some awkward data and it chokes, or loses a very long time recovering or double-checking or invoking expensive functions. How do we go in checking this out? You can time the function - given the order of magnitude of the slowness, we just need to be able to appreciate milliseconds; there are several snippets around to do that.

    int times[1000];
    for (j = 0; j < 1000; j++) { times[j] = 0; }
    
    while (!done)
    {
        size_t  s;
        int     ms1 = getTimeMilliseconds();
        const char *p = icmDisassemble(processor[i], currentPC);
        int     ms2 = getTimeMilliseconds() - ms1;
        if (ms2 > 999) {
            ms2 = 999;
        }
        times[ms2]++;
    

After a run, or if the clock exceeds a decent run time, you dump the array to stdout, ignoring zero entries, and get something like this:

times
----
0         182493 <-- times obviously not zero, but still < 1 ms
1         9837
2         28
3         5
6         1
135       1 <---- two suspicious glitches (program preempted by kernel?)
337       1 <--
999       5 <-- on five occasions the function has stalled

If this turned out to be the case, you could add a backtracking section immediately after icmDisassemble call, to check the time and dump diagnostic information the first time it exceeds a reasonable limit.

Also comparing wall time and CPU time could yield valuable information - for example revealing that something else is preempting your program, or that it spends most of its time waiting for something.

Jongware
  • 22,200
  • 8
  • 54
  • 100
LSerni
  • 55,617
  • 10
  • 65
  • 107