how to replace a string in an efficient way in C

Question

I have a string which is generated by linux uuid generation code (libc):

1b4e28ba-2fa1-11d2-883f-b9a761bde3fb

I need to replace some of the characters in this string:

- with _
2 with f
4 with x

I am generating the 200 UUIDs using a loop.

So for every uuid I need to replace using a custom function, so that function must be maximum optimised to do so, how can I attain that?

200? That's not a large number, it would take some real work to not make this almost instant on a modern computer. Also, what do you have so far? — Collin, Apr 20 '13 at 03:20
@Collin I just have a uuid gen program which generates UUID in the above format , can i use regular expression to do so , any example — Kajal, Apr 20 '13 at 03:24
@RobertHarvey : any optimized way like with out this much traversal — Kajal, Apr 20 '13 at 03:24
@johnchen902 Replace every 2 with f if exists Replace every 4 with x if exists Replace every - with '_' — Kajal, Apr 20 '13 at 03:26
You can attain that by writing a program which does that. It's best to start early. If you have any problems, please post here. — user93353, Apr 20 '13 at 03:31
Umm.... why don't you just loop over the string, `switch` on each char, and replace it accordingly? If the "-" are always in the same place, you can just hard code which bytes to override. Really can't get much more optimized than that... — Nicu Stiurca, Apr 20 '13 at 03:32
@SchighSchagh , is it a good way "switch on each char, and replace it accordingly" for 200 uuids? — Kajal, Apr 20 '13 at 03:33
@RobertHarvey OP is replacing individual characters not substrings per se so this question is not a duplicate. — Matt Phillips, Apr 20 '13 at 03:33
I did some testing on the different algorithms described here, see my post. — Matt Phillips, Apr 20 '13 at 05:13

johnchen902 · Answer 1 · 2013-04-20T05:20:57.613

4

I suppose you're using char[] str

char *c;
for(c = str; *c != '\0'; ++c){
    if( *c == '-' ) *c = '_';
    else if( *c == '2' ) *c = 'f';
    else if( *c == '4' ) *c = 'x';
}

switch version

char *c;
for(c = str; *c != '\0'; ++c){
    switch(*c){
        case '-': *c = '_'; break;
        case '2': *c = 'f'; break;
        case '4': *c = 'x'; break;
    }
}

edited Apr 20 '13 at 05:20

answered Apr 20 '13 at 03:34

johnchen902

9,531
1
27
69

what about replacing with regular expression – Kajal Apr 20 '13 at 03:35
2

@Kajal replacing via regular expression is slower – johnchen902 Apr 20 '13 at 03:36
1

@Kajal reg ex is overkill. I think writing it with a `switch` as I said in comment above instead of `if/else if` would be more readable, but it likely compiles to the same or very similar code. – Nicu Stiurca Apr 20 '13 at 03:37
You may be interested in the result of my testing. – Matt Phillips Apr 20 '13 at 05:12
@MattPhillips did you use `-O3` in your compiler argument? – johnchen902 Apr 20 '13 at 05:19
@johnchen902 Good call, I did not--and also forgetfully ran it in release mode--and when I did, you win! – Matt Phillips Apr 20 '13 at 05:36

Yongwei Wu · Answer 2 · 2013-04-21T12:41:47.290

4

Is something as trivial as this what you want?

void my_replace(char* str)
{
    while (*str) {
        switch (*str) {
        case '-':
            *str = '_';
            break;
        case '2':
            *str = 'f';
            break;
        case '4':
            *str = 'x';
            break;
        default:
            break;
        }
        ++str;
    }
}

It is really fast and simple. I really can't see how you can make it faster.

EDIT: I am aware of some optimizations in certain string operations, but I do not see how they can be applicable here. For example, in the case of memcpy, one may copy 4 or more bytes at a time, depending on the processor. In case of comparing strings that is properly aligned, comparing as integers may be possible and more efficient. I just can't see an applicable technique.

edited Apr 21 '13 at 12:41

answered Apr 20 '13 at 04:05

Yongwei Wu

5,292
37
49

High performance libraries such as LAPACK and BLAS use finely tuned, processor-specific code (often in Assembly) to achieve speeds many times faster than what one can do in a high-level language, even for operations as simple as matrix multiplication. I would expect the same is true of the C standard library as well. – Matt Phillips Apr 20 '13 at 04:08
@MattPhillips Your expectation is invalid. Have you tested the optimisations that your compiler implements, lately? – autistic Apr 20 '13 at 04:23
1

Have you? What makes you certain that the C lib is not optimized? – Matt Phillips Apr 20 '13 at 04:46
As an example of what @Matt is saying, glibc employs SSE to optimise `memcpy`, `strnlen`, and other functions. In fact, several libc implementations do employ optimisations like this. – jweyrich Apr 20 '13 at 05:22

Matt Phillips · Answer 3 · 2013-04-20T05:41:42.427

C library functions may be optimized and significantly faster than a hand-coded iteration.

char* uuid; // = ...
//    size_t uuid_len; // = ... length of uuid


char* ptr = strpbrk(uuid, "-24");
while (ptr)
{
   switch(*ptr)
   {
      case '-':
          *ptr = '_';
          break;
      case '2':
          *ptr = 'f';
          break;
      case '4':
          *ptr = 'x';
          break;
   }
//       if (ptr-uuid == uuid_len) break;

   ptr = strpbrk(ptr+1, "-24");
}

Edit: Took out the range checking, on the basis of the example here that doesn't seem necessary.

Edit: So I decided to test out the 3 algorithms here to see which is faster. I had a loop of 100000 strings, on a vintage 2006 Mac Pro, compiling with gcc, -O3. I took the average of 1000 runs and did 5 cycles.

AND THE WINNER IS...

@johnchen by a hair with an average time of 7.85ms.

@YongweiWu just behind with an average time with 7.89ms. The difference looks significant; going in and doing a proper statistical test is not going to happen tonight unfortunately. :)

...and strpbrk a distant third at 32ms. (Glad I qualified all my optimization claims with 'might', 'may' etc....)

Edit: There is a big difference with Clang--j @ WY's algorithms take 10ms under Clang (looks tied between them), mine is unchanged.

@user93353 No, look at the last line. After a character find `strpbrk` restarts at the next character, it does not start over at the beginning. — Matt Phillips, Apr 20 '13 at 03:51

how to replace a string in an efficient way in C

3 Answers3

AND THE WINNER IS...