The difference between a macro and an inlined function is that a macro is dealt with before the compiler sees it.
On my compiler (clang++) without optimisation flags the square function won't be inlined. The code it generates looks like this
4009f0: 55 push %rbp
4009f1: 48 89 e5 mov %rsp,%rbp
4009f4: 89 7d fc mov %edi,-0x4(%rbp)
4009f7: 8b 7d fc mov -0x4(%rbp),%edi
4009fa: 0f af 7d fc imul -0x4(%rbp),%edi
4009fe: 89 f8 mov %edi,%eax
400a00: 5d pop %rbp
400a01: c3 retq
the imul is the assembly instruction doing the work, the rest is moving data around.
code that calls it looks like
400969: e8 82 00 00 00 callq 4009f0 <_Z6squarei>
iI add the -O3 flag to Inline it and that imul shows up in the main function where the function is called from in C++ code
0000000000400a10 <main>:
400a10: 41 56 push %r14
400a12: 53 push %rbx
400a13: 50 push %rax
400a14: 48 8b 7e 08 mov 0x8(%rsi),%rdi
400a18: 31 f6 xor %esi,%esi
400a1a: ba 0a 00 00 00 mov $0xa,%edx
400a1f: e8 9c fe ff ff callq 4008c0 <strtol@plt>
400a24: 48 89 c3 mov %rax,%rbx
400a27: 0f af db imul %ebx,%ebx
It's a reasonable thing to do to get a basic handle on assembly language for your machine and use gcc -S on your source, or objdump -D on your binary (as I did here) to see exactly what is going on.
Using the macro instead of the inlined function gets something very similar
0000000000400a10 <main>:
400a10: 41 56 push %r14
400a12: 53 push %rbx
400a13: 50 push %rax
400a14: 48 8b 7e 08 mov 0x8(%rsi),%rdi
400a18: 31 f6 xor %esi,%esi
400a1a: ba 0a 00 00 00 mov $0xa,%edx
400a1f: e8 9c fe ff ff callq 4008c0 <strtol@plt>
400a24: 48 89 c3 mov %rax,%rbx
400a27: 0f af db imul %ebx,%ebx
Note one of the many dangers here with macros: what does this do ?
x = 5; std::cout << SQUARE(++x) << std::endl;
36? nope, 42. It becomes
std::cout << ++x * ++x << std::endl;
which becomes 6 * 7
Don't be put off by people telling you not to care about optimisation. Using C or C++ as your language is an optimisation in itself. Just try to work out if you're wasting time with it and be sensible.