According to "Schema Validation with Intel® Streaming SIMD Extensions 4 (Intel® SSE4)" (Intel, 2008) [they] added instructions to assist in character searches and comparison on two operands of 16 bytes at a time. I wrote some basic strlen() and strcmp() functions in C, but they seem slower than glibc.
I would like to maybe experiment with using inline assembly to see how my project behaves with inputting/outputting XML.
I've read (on here) that using SMID on things like strlen() is rife with potential problems (memory alignment), so I'm a little concerned about using it in production code.