0

Program works with .wave file.

The code below is a part of program that finds "data" subchunk. It writes all necessary chunks to the output file and then finds "data" (copyes next 4 bytes into char comp_dataID[4]; and compares it with const char dataID[4] = "data";):

while(1)  /* finding "data"*/ {
    fread(comp_dataID, 4, 1, input);

    if ( memcmp(comp_dataID, dataID, 4) == 0 ) {
        printf(">>>   \"data\" found!\n");
        fwrite(&comp_dataID, 1, 4, output);
        break;
    }
    else {
        fseek(input, -3, SEEK_CUR);
    }
}

There can be many extentional subchunks before the "data", so I want to optimize the program:

  1. If the next 4 bytes contain "...." then copy next 4 bytes. (skips 3 unnecessary operations)
  2. If "...d" then fseek(input, -1, SEEK_CUR); /* set pionter before "d" */ and then copy the next 4 bytes.
  3. If "..da" then fseek(input, -2, SEEK_CUR); /* set pionter before "d" */ and then copy the next 4 bytes.
  4. If ".dat" then fseek(input, -3, SEEK_CUR); /* set pionter before "d" */ and then copy the next 4 bytes.

The problem is that I don't understand how to compare "...d" and "data". I.e. how to find out if char comp_dataID[4]; containd "...d" or "..da" or ".dat".


The question: Is there any function that does this (that returns number of characters that were matched: 0 in case of "....", 1 in case of "...d" and so on.)?

...or I shall use for() cycle ti find "d", then to find "a" and then "t". And according to the rezults, set pionter before "d" in order to copy the next 4 bytes ("data").

PS After this char[4] the next 4 bytes are the size of all samples (it is used in program)

yulian
  • 1,601
  • 3
  • 21
  • 49

2 Answers2

2

Before you start trying to optimise, are you sure that it's a problem? Have you actually run your code in a profiler and determined that the few extra clock cycles a loop are the biggest thing slowing down your program, and not the disk I/O, or stuff happening elsewhere?

memcmp in the average case will probably be not much slower than rolling your own function to compare and calculate the offset, and will likely be a minimal contribution compared to the effects of disk I/O and whatever processing you actually end up doing.

*edit*Removed broken example.


yulian
  • 1,601
  • 3
  • 21
  • 49
Sysyphus
  • 1,061
  • 5
  • 10
  • No, I didnt run my code in... profiler. (I don't know what is it?) I'm not very skilled programmer, but I'm convinced that if there are very large "extra" subchunks, this kind of optimization will help to save some machine's resources (Am I right?). – yulian May 28 '13 at 10:03
  • Info on profilers: http://stackoverflow.com/questions/1794816/recommendations-for-c-profilers – Nobilis May 28 '13 at 10:08
  • What type does `strstr()` return? – yulian May 28 '13 at 10:32
  • As the linked documentation says, char*. – Sysyphus May 28 '13 at 10:35
  • Well, this: `set_input = strstr( comp_dataID, dataID );` will return a pointer to the beginning of `data` string. But how can I displace the `input (pointer)` to the beginning of that `"data"`? **PS** (On "displase" I mean.. for ex: `fread()` displaces pointer and the next `fread()` will continue from the place where it has stopped). – yulian May 28 '13 at 10:44
  • `pos-otherpos = offset`. So store your previous position, calculate how far you moved, and fseek the appropriate amount. – Sysyphus May 28 '13 at 11:07
  • @Sysyphus If you give me an example (like the other answer), I'll probably accept your answer. Because your version of solution is more universal. **PS** It is difficult for me to understand how can I find the `offset` (pos-otherpos). – yulian May 28 '13 at 14:18
  • Ignore the recommendation for strstr, I forgot you were dealing with binary data, so strstr won't work. You can use memmem if you're only targetting Linux, but all this is beside the point. memcmp likely stops evaluating as soon as the strings don't match, so in the general case it's one comparison. You'd need to have three extra comparisons to know what offset you need, in order to save making three comparisons. Net gain nothing. Step one, make your implementation work. Once it's working, if it's too slow use a profiler. If the profiler says that's the slow bit, then only then worry about it. – Sysyphus May 28 '13 at 14:32
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/30759/discussion-between-sysyphus-and-julian) – Sysyphus May 28 '13 at 14:37
0

This function will do it:

int match(char *a, const char *b) {

    int matches = 0;

    if( a[3] == b[0] )  
        matches = 1;
    if( a[2] == b[0] )
        matches = 2;
    if( a[1] == b[0] )
        matches = 3;
    if( a[0] == b[0] )
        matches = 4;

    return matches;
}

int main()
{
    ...
    step = match( buf, dataID ); // number of matched letters
    fseek(input, -step, SEEK_CUR);       // sets `pointer` to the beginning of "data"

    return 0;
}
yulian
  • 1,601
  • 3
  • 21
  • 49
Nobilis
  • 7,310
  • 1
  • 33
  • 67
  • The problem is that string that I compare with `const char` can begin like this `???d` or `??da` or `?dat`... – yulian May 28 '13 at 10:25
  • This will give you how long until two strings diverge which is a different problem. He's after how long until two converge. – Sysyphus May 28 '13 at 10:25
  • @Julian Regarding the edit whatever works for you :) I was looking into avoiding hard-coding any values, perhaps I still missed an aspect of your problem. – Nobilis May 28 '13 at 11:51
  • 1
    @Nobilis If you find (Ctrl + F) here: https://ccrma.stanford.edu/courses/422/projects/WaveFormat/ the following: **if PCM, then doesn't exist**, you'll be able to see "Extra Chunks"... I wanted to jump over them (I'm not interested in them) to the beginning of "data". Anyway, the problem has been solved. Thanks! ;) – yulian May 28 '13 at 12:00
  • Aha, oh that should be okay then. I have to admit that I know nothing about working with sound files so their composition is completely elusive to me. – Nobilis May 28 '13 at 12:05