0

I was asked this question in tech test.
They asked how to change ' ' to '_' in string.
I think they didn't want common answer. like this (I can assure this)

void replaceChar(char originalStr[], size_t strLength, char originalChar, char newChar 
{
    for(size_t i = 0 ; i < strLength ; i++)
    {
        if(originalStr[i] == originalChar)
        {
            originalStr[i] = newChar ;
        }
    }
}

So I answered like this. Use WORD. ( Actually I didn't write code, They want just explaining how to do)
I think comparing Each 8 byte(64bit OS) of string with mask 8 byte.
if They eqaul, replace 8byte in a time.

When Cpu read data with size less than WORD , Cpu should do operation clearing rest bits.
It's slow. So I tried to use WORD in comparing chars.

void replaceChar(char originalStr[], size_t strLength, char originalChar, char newChar // 
    {
        size_t mask = 0;
        size_t replaced = 0;
        for(size_t i = 0 ; i < sizeof(size_t) ; i++)
        {
            mask |= originalChar << i;
            replaced |= newChar << i;
        }
        
        for(size_t i = 0 ; i < strLength ; i++)
        {
            
            // if 8 byte data equal with 8 byte data filled with originalChar
            // replace 8 byte data with 8 byte data filled with newChar 
            if(i % sizeof(size_t) == 0 && 
               strLength  - i > sizeof(size_t) && 
               *(size_t*)(originalStr + i) == mask)
            {
                *(size_t*)(originalStr + i) = replaced;
                i += sizeof(size_t);
                continue;
            }

            if(originalStr[i] == originalChar)
            {
                originalStr[i] = newChar ;
            }
        }
    }

Is There any faster way??

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
SungJinKang
  • 409
  • 3
  • 9
  • are you absolutely sure they wanted your code to be "faster" and not "more readable" or "more correct"? the "common answer" is `std::replace`, and both your programs have some errors in them, at least regarding `sizeof` use – Ap31 May 22 '21 at 07:34
  • A similar question was raised, with good answers using both replace and regex - https://stackoverflow.com/questions/5252612/replace-space-with-an-underscore/5253245 – Uri Raz May 22 '21 at 07:37
  • Generally it's not fast if you first need to scan 8 bytes to find "TTTTFTTT", i.e. 7 trues and 1 false, and then you need to process the same area once more. AFAIK, the only fast technique is to use SIMD (intrinsics) that can compare 8 or 16 characters at the same time individually. Simulated SIMD, i.e. SWAR can be faster on some architectures. And furthermore -- compilers can _often_ auto vectorise a well written intent with SIMD. – Aki Suihkonen May 22 '21 at 07:37
  • 3
    The speed is not the problem. That you don't know how `sizeof` and null-terminated strings work is. – molbdnilo May 22 '21 at 07:40
  • 2
    Did you *test* your proposed solution? Also: https://stackoverflow.com/q/35578516/3185968 shows that any decent compiler doesn't need to be told to use larger blocks than a single character at a time, and the compiler will do *much* **better** than your naive attempt by using wide vector registers where available. – EOF May 22 '21 at 07:42
  • @SungJinKang Never do any assignments in an interview. The interview is not an exam. It is a conversation of equal sides. – Vlad from Moscow May 22 '21 at 07:42
  • @Ap31 I'm sorry. I fixed it – SungJinKang May 22 '21 at 07:49
  • @AkiSuihkonen I think SIMD also have 7 ture 1 false problem – SungJinKang May 22 '21 at 07:51
  • SIMD is very much able to calculate each comparison individually. The obstacle for SIMD is the necessity to know the length in advance so that you are guaranteed not to read past your designated buffer. – Aki Suihkonen May 22 '21 at 07:52
  • @EOF I think this is close to answer.. Can i do this without SIMD. I just wanna how to implement it myself – SungJinKang May 22 '21 at 08:00
  • @AkiSuihkonen Thanks EOF says SIMD can compare them and blend them with result – SungJinKang May 22 '21 at 08:02
  • 1
    @SungJinKang You *can* to this without SIMD, but reasonable architectures today have SIMD that is a fair bit wider than integer general purpose registers, so you can usually ensure your code will be memory or even load/store bandwidth bound by using SIMD (so no other code could possibly be faster). On the other hand, on unreasonable architectures (no SIMD), it may not even be worth aligning the starting point in the memory to access with greater than `char` size. – EOF May 22 '21 at 08:47

2 Answers2

1

Do not try to optimize a code when you do not know what is the bottleneck of the code. Try to write a clear readable code.

This function declaration and definition

void replaceChar(char originalStr[], size_t strLength, char originalChar, char newChar 
{
    for(size_t i = 0 ; i < strLength ; i++)
    {
        if(originalStr[i] == originalChar)
        {
            originalStr[i] = newChar ;
        }
    }
}

does not make a sense because it duplicates the behavior of the standard algorithm std::replace.

Moreover for such a simple basic general-purpose function you are using too long identifier names.

If you need to write a similar function specially for C-strings then it can look for example the following way as it is shown in the demonstrative program below

#include <iostream>
#include <cstring>

char * replaceChar( char s[], char from, char to )
{
    for ( char *p = s; ( p = strchr( p, from ) ) != nullptr; ++p )
    {
        *p = to;
    }
    
    return s;
}

int main() 
{
    char s[] = "Hello C strings!";
    
    std::cout << replaceChar( s, ' ', '_' ) << '\n';
    
    return 0;
}

The program output is

Hello_C_strings!

As for your second function then it is unreadable. Using the continue statement in a body of for loop makes it difficult to follow its logic.

As a character array is not necessary aligned by the value of size_t then the function is not as fast as you think.

If you need a very optimized function then you should write it directly in assembler.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • both solution show same performance??? In my benchmark, Solution using strchr looks much faster without any optimization option. Why???.... I think they do same thing – SungJinKang May 22 '21 at 08:28
  • 1
    @SungJinKang In some platforms the function strchr is implemented in fact as one assembler instruction. – Vlad from Moscow May 22 '21 at 08:39
  • @SungJinKang: `show same performance` who's claim, and established how? Micro-benchmarking is a black art and a moving target. `looks much faster without any optimization option` looks a waste of time and brain cycles. – greybeard May 22 '21 at 21:45
0

The first thing in the road to being fast is being correct. The problem with the original proposal is that sizeof(s) should be a cached value of strlen(s). Then the obvious problem is that this approach scans the string twice -- first to find the terminating character and then the character to be replaced.

This should be addressed by a data structure with known length, or data structure, with enough guaranteed excess data so that multiple bytes can be processed at once without Undefined Behaviour.

Once this is solved (the OP has been edited to fix this) the problem with the proposed approach of scanning 8 bytes worth of data for ALL the bytes being the same is that a generic case does have 8 successive characters, but maybe only 7. In all those cases one would need to scan the same area twice (on top of scanning the string terminating character).

If the string length is not known, the best thing is to use a low level method:

while (*ptr != 0) {
   if (*ptr == search_char) {
       *ptr = replace_char;
   }
   ++ptr;
}

If the string length is known, it's best to use a library method std::replace, or it's low level counterpart

for (auto i = 0; i < size; ++i) {
    if (str[i] == search_char) {
        str[i] = replace_char;
    }
}

Any decent compiler is able to autovectorize this, although the compiler might generate a larger variety of kernels than intended (one kernel for small sizes, one for intermediate and one to process in chunks of 32 or 64 bytes).

Aki Suihkonen
  • 19,144
  • 1
  • 36
  • 57