C++ doesn't have a standard, portable way to include inline assembly. Inline assembly is, almost by definition, a non-standard, non-portable thing.
However, if you've profiled your application and discovered that it needs tuning in a particular area that isn't served well enough by optimized C++ and/or intrinsics, I'd recommend putting the assembly into its own file(s) that are conditionally assembled by the appropriate tool for each platform the code is intended to run on. You would also want a native C++ implementation for platforms whose assembly language you don't support.
As an aside, I've used GNU's inline asm variants in the past, and I have to say they tend to make your code look really ugly and opaque to another programmer. If you're writing bare-metal code that simply has to have maximum bandwidth, well, okay, but if you want something long-term maintainable... maybe favor that over performance.