0

I have a string which is generated by linux uuid generation code (libc):

1b4e28ba-2fa1-11d2-883f-b9a761bde3fb

I need to replace some of the characters in this string:

- with _
2 with f
4 with x

I am generating the 200 UUIDs using a loop.

So for every uuid I need to replace using a custom function, so that function must be maximum optimised to do so, how can I attain that?

Xymostech
  • 9,710
  • 3
  • 34
  • 44
Kajal
  • 223
  • 4
  • 15
  • 2
    200? That's not a large number, it would take some real work to not make this almost instant on a modern computer. Also, what do you have so far? – Collin Apr 20 '13 at 03:20
  • 1
    http://stackoverflow.com/a/779960 – Robert Harvey Apr 20 '13 at 03:21
  • @Collin I just have a uuid gen program which generates UUID in the above format , can i use regular expression to do so , any example – Kajal Apr 20 '13 at 03:24
  • @RobertHarvey : any optimized way like with out this much traversal – Kajal Apr 20 '13 at 03:24
  • @johnchen902 Replace every 2 with f if exists Replace every 4 with x if exists Replace every - with '_' – Kajal Apr 20 '13 at 03:26
  • 1
    You can attain that by writing a program which does that. It's best to start early. If you have any problems, please post here. – user93353 Apr 20 '13 at 03:31
  • Umm.... why don't you just loop over the string, `switch` on each char, and replace it accordingly? If the "-" are always in the same place, you can just hard code which bytes to override. Really can't get much more optimized than that... – Nicu Stiurca Apr 20 '13 at 03:32
  • 1
    @SchighSchagh , is it a good way "switch on each char, and replace it accordingly" for 200 uuids? – Kajal Apr 20 '13 at 03:33
  • 1
    @RobertHarvey OP is replacing individual characters not substrings per se so this question is not a duplicate. – Matt Phillips Apr 20 '13 at 03:33
  • I did some testing on the different algorithms described here, see my post. – Matt Phillips Apr 20 '13 at 05:13

3 Answers3

4

I suppose you're using char[] str

char *c;
for(c = str; *c != '\0'; ++c){
    if( *c == '-' ) *c = '_';
    else if( *c == '2' ) *c = 'f';
    else if( *c == '4' ) *c = 'x';
}

switch version

char *c;
for(c = str; *c != '\0'; ++c){
    switch(*c){
        case '-': *c = '_'; break;
        case '2': *c = 'f'; break;
        case '4': *c = 'x'; break;
    }
}
johnchen902
  • 9,531
  • 1
  • 27
  • 69
4

Is something as trivial as this what you want?

void my_replace(char* str)
{
    while (*str) {
        switch (*str) {
        case '-':
            *str = '_';
            break;
        case '2':
            *str = 'f';
            break;
        case '4':
            *str = 'x';
            break;
        default:
            break;
        }
        ++str;
    }
}

It is really fast and simple. I really can't see how you can make it faster.

EDIT: I am aware of some optimizations in certain string operations, but I do not see how they can be applicable here. For example, in the case of memcpy, one may copy 4 or more bytes at a time, depending on the processor. In case of comparing strings that is properly aligned, comparing as integers may be possible and more efficient. I just can't see an applicable technique.

Yongwei Wu
  • 5,292
  • 37
  • 49
  • High performance libraries such as LAPACK and BLAS use finely tuned, processor-specific code (often in Assembly) to achieve speeds many times faster than what one can do in a high-level language, even for operations as simple as matrix multiplication. I would expect the same is true of the C standard library as well. – Matt Phillips Apr 20 '13 at 04:08
  • @MattPhillips Your expectation is invalid. Have you tested the optimisations that your compiler implements, lately? – autistic Apr 20 '13 at 04:23
  • 1
    Have you? What makes you certain that the C lib is not optimized? – Matt Phillips Apr 20 '13 at 04:46
  • As an example of what @Matt is saying, glibc employs SSE to optimise `memcpy`, `strnlen`, and other functions. In fact, several libc implementations do employ optimisations like this. – jweyrich Apr 20 '13 at 05:22
1

C library functions may be optimized and significantly faster than a hand-coded iteration.

char* uuid; // = ...
//    size_t uuid_len; // = ... length of uuid


char* ptr = strpbrk(uuid, "-24");
while (ptr)
{
   switch(*ptr)
   {
      case '-':
          *ptr = '_';
          break;
      case '2':
          *ptr = 'f';
          break;
      case '4':
          *ptr = 'x';
          break;
   }
//       if (ptr-uuid == uuid_len) break;

   ptr = strpbrk(ptr+1, "-24");
}

Edit: Took out the range checking, on the basis of the example here that doesn't seem necessary.

Edit: So I decided to test out the 3 algorithms here to see which is faster. I had a loop of 100000 strings, on a vintage 2006 Mac Pro, compiling with gcc, -O3. I took the average of 1000 runs and did 5 cycles.

AND THE WINNER IS...

@johnchen by a hair with an average time of 7.85ms.

@YongweiWu just behind with an average time with 7.89ms. The difference looks significant; going in and doing a proper statistical test is not going to happen tonight unfortunately. :)

...and strpbrk a distant third at 32ms. (Glad I qualified all my optimization claims with 'might', 'may' etc....)

Edit: There is a big difference with Clang--j @ WY's algorithms take 10ms under Clang (looks tied between them), mine is unchanged.

Matt Phillips
  • 9,465
  • 8
  • 44
  • 75
  • @user93353 No, look at the last line. After a character find `strpbrk` restarts at the next character, it does not start over at the beginning. – Matt Phillips Apr 20 '13 at 03:51