2

I was reading a C++ paper on if consteval (§3.2), and saw a code showing a constexpr strlen implementation:

constexpr size_t strlen(char const* s) {
    if constexpr (std::is_constant_evaluated()) {
        for (const char *p = s; ; ++p) {
            if (*p == '\0') {
                return static_cast<std::size_t>(p - s);
            }
        }    
    } else {
        __asm__("SSE 4.2 insanity");        
    }
}

I'm here to ask about the __asm__ statement in the else branch.

I know that's humour and not meant to be taken seriously, but I still decided to do some researches in case someone already explained it. When I googled the quoted message I had less than 10 results, all about this piece of code. I then researched what is SSE 4.2 and found that it's a CPU instruction set, so I really have no clue about what it appears in a C++ paper, does someone have an explanation? Thanks to those who'll read my post.

Chi_Iroh
  • 1,061
  • 5
  • 14
  • 14
    It's a placeholder to mean "some crazy SSE 4.2 stuff" :) It's not a real instruction. – cigien Jun 03 '23 at 11:09
  • 9
    This is called "humor", maybe you've heard of it... and it seems some members of the C++ committee know about it too ;) – Pepijn Kramer Jun 03 '23 at 11:11
  • 4
    [Fun](https://www.strchr.com/strcmp_and_strlen_using_sse_4.2) – Nelfeal Jun 03 '23 at 11:12
  • @cigien That's what I sais to myself but I also thought there were some technical things here, because I don't really understand what SSE is doing here, instead of a runtime implementation of `strlen` ? ~~Unless it means CPUs may have a builtin `strlen` implementation ?~~ Oh nevermind, @Nelfeal answers my question thanks to his link, I'll close the question, thank you all :) – Chi_Iroh Jun 03 '23 at 11:21
  • 2
    An asm implementation using SSE *is* a runtime implementation. – Nelfeal Jun 03 '23 at 11:22
  • @Nelfeal I did mean a sort of copy-paste of a for-loop. – Chi_Iroh Jun 03 '23 at 11:23
  • 4
    @Chi_Iroh not a builtin strlen, the idea here is that at runtime you can use a faster implementation using SSE instructions that aren’t available in a in a constexpr context. The real implementation would be a bit longer, it’s omitted here because showing that implementation is not the point of the paper. – Cubic Jun 03 '23 at 11:24
  • Thank you very much, I created an answer to close the post in 2 days when I'll be able to validate it. – Chi_Iroh Jun 03 '23 at 11:29
  • 2
    Funny, but note that SSE4.2 is not particularly useful for implementing `strlen`, even the older SSE2 was better at that and the gap widens with AVX2 – harold Jun 03 '23 at 11:32

1 Answers1

5

cigien is correct:

It's a placeholder to mean "some crazy SSE 4.2 stuff" :) It's not a real instruction

Although to be fair, I can't take credit for this particular joke, it comes from David Stone's constexpr function parameters paper.

The point here isn't what is actually the optimal way to implement strlen with SSE instructions, but rather that there is a way to do so if you hand-write your assembly, which is likely going to be better than the manual loop, and whatever that way is it is definitely not constexpr friendly -- as such the specific instruction list isn't really relevant to the question. Whatever it is, it can't work at compile time, so needs to be switched out.

Barry
  • 286,269
  • 29
  • 621
  • 977
  • 1
    *the point here isn't what is actually the optimal way to implement strlen with SSE instructions,* - that's also clear from the fact that SSE4.2 [isn't an efficient way to implement easy stuff like `strlen` or `memcmp`](https://stackoverflow.com/questions/46762813/how-much-faster-are-sse4-2-string-instructions-than-sse2-for-memcmp); SSE2 `pcmpeqb` is faster on real CPUs, and AVX2 can go twice as fast. SSE4.2 has some uses for more complicated operations like `strstr`, though. – Peter Cordes Jun 03 '23 at 16:03
  • 1
    SSE4.2 string instructions are complicated (https://www.strchr.com/strcmp_and_strlen_using_sse_4.2), so they make a good example of something that's hard for a compiler to constant-propagate through even if the code had used intrinsics like `_mm_cmpistri` instead of `asm("")`. (Intel's intrinsics API isn't `constexpr` compatible anyway, but that could be changed in theory.) – Peter Cordes Jun 03 '23 at 16:06
  • 2
    @PeterCordes Dude, it's... a joke. – Barry Jun 03 '23 at 16:22
  • 1
    I know the "crazy" part is a joke, but I just wanted to point out that "some crazy SSE2 stuff" would have been equally pithy and more realistic for `strlen`. But SSE4.2 was newish (2009) around the same time C++11 `constexpr` was being discussed, so it makes sense that the new hotness would end up in an example here. In the early days, it wasn't well known that the new instructions were slower for simple stuff like strlen on existing CPUs. And there was the possibility that the instructions would get faster in future CPUs that could throw more transistors at them. (Which didn't happen.) – Peter Cordes Jun 03 '23 at 16:56
  • It seems that my below answer is still not acceptable, can someone help me to complete it ? I don't know what to add because I'm explaining all my thinking to help other people who'll have the same question than me. – Chi_Iroh Jun 04 '23 at 00:38
  • 1
    @Chi_Iroh: Your answer spends most of its time justifying why you posted the question, not answering it. You don't need to do that, the question isn't getting more downvotes than upvotes. Barry's answer already answers the question. Maybe I missed/forgot something, but I don't think the answer you posted adds anything else that's directly helpful to readers who want to know the answer to the question. It does indirectly mention some answer-related things, which is most of the reason I chose not to downvote it. – Peter Cordes Jun 04 '23 at 05:16
  • 1
    If you want to mark the question as solved, use the checkbox under one of the answers. Pick one that actually contains answers to all the things you wanted to know. If you don't find Barry's complete enough, and want to write more, edit your own to actually do that, perhaps quoting comments you found helpful. (You've already accepted Barry's, I didn't notice if that was before or after you posted that last comment about your own answer. At this point it would be fine to just leave your answer alone or delete it, it's fine. Welcome to Stack Overflow :) – Peter Cordes Jun 04 '23 at 05:16
  • OK thanks for the explanation, it was deleted via review so I guess all is fine now. – Chi_Iroh Jun 04 '23 at 13:29