Avoid readable text in binaries or disassembled code

Question

is there any widely used procedure for hiding readable strings? After debugging my code i found a lot of plain text. I can use some simple encryption (Caesar cipher etc...) but this solution will totally slow down my code. Any ideas?

"but this solution will totally slow down my code." Have you benchmarked and confirmed this? Also, note that if, at any time, your program has to use the decrypted strings, then someone reverse-engineering it can read them also. — Jonathon Reinhart, Jun 01 '13 at 19:15
What level of encryption do you want? ROT13 is pretty easy and fast, and the result is not human readable, but anyone can decrypt if they really wanted to. — jh314, Jun 01 '13 at 19:18
If something as simple as a Caesar cipher is going to slow down your code unacceptably, then nothing will be acceptable since simple ciphers like Caesar ciphers are about as fast as it gets. — Carey Gregory, Jun 01 '13 at 22:05
possible duplicate: http://stackoverflow.com/questions/1356896/how-to-hide-a-string-in-binary-code — kotlomoy, Jun 01 '13 at 22:09
If you carry all your strings encrypted but of the same length as the original, (Caesar or even better), and you don't look inside them, I don't see how this will slow your code down anywhere except at the point of input and output of strings, where it hardly matters. If you have to look inside the strings, you may pay a decrypting penalty; unless you measure this you're unlikely to know that is this is a *high* overall cost. — Ira Baxter, Jun 01 '13 at 22:16
Your best chance is _steganography_ - hide the strings within strings, and add huge amounts of clutter. For example, come up with a table to locate your strings within the collected works of William shakespeare and Julius Caesar, then add those in full, and use index tables. Yes, slows you down even more. — FrankH., Jun 04 '13 at 05:46
Why the ansi and iso flags? And what language do you want C or assembly and if assembly what processor? — mmmmmm, Jun 10 '13 at 12:41

score 4 · Accepted Answer · answered Jun 01 '13 at 22:05

No, there is no widely used method for hiding referenced strings.

At some point an accessed string would have to be decrypted and this would reveal the key/method and your decryption becomes just obfuscation. If somebody wants to read all your referenced strings he could easily write some script to just convert them all to be readable.

I can't think of any reason to obfuscate strings like that. They are only visible to someone that analyses your executable. Those people would at the same time also be capable to reverse engineer your deobfuscation an apply it to all strings.

If secrecy of strings is vital to the security of your application, you have to rethink that.

Sidenote: There is no way that deciphering strings in C will slow down your application ...Except your application is full of strings and you do something very inefficient in the deciphering. Have you tested this?

Even though somebody can analyze the encryptuin, it still can make sense, because it slows down the reverse engineering process. — Devolus, Jun 02 '13 at 18:45
Not by a second. I'd simply take a memory dump of your _running_ application, it'll have a lot of the "decrypted" strings in it anyway ... — FrankH., Jun 04 '13 at 05:39
Parts of the string must be at different addresses. Therefore the dump is useless at this point. Just find the calculation and start reversing. — blackfigure00, Jun 19 '13 at 11:19

score 0 · Answer 2 · answered Mar 22 '23 at 14:04

This won't stop serious reverse-engineering (e.g. with a debugger to see strings in memory after your program decrypts them), but can hide some strings from a casual strings -a a.out.

If you were going to do something like this, the most CPU-efficient way is usually to XOR each byte with some constant. Or each 4-byte chunk with some 32-bit constant. (Either way, "decryption" can be done in chunks of register width, like 64-bit or with SIMD 128-bit, either in-place or copy-and-xor just as fast as memcpy.) Decryption/encryption are the same function, since x^x == 0 and XOR is associative/commutative.

That's what GNU C memfrob does, being designed for your use-case of lightly obscuring data in memory by XORing each byte with the constant 42. A constant with its high bit set would make ASCII into non-ASCII characters, like 0xaa.

See also How to hide a string in binary code? for a C++ with Boost CPP macros that encrypt strings at compile time. It uses a loop counter to vary the XOR constant per-byte.

A Caesar cipher is usually only defined over alphabetic characters, wrapping at the alphabet boundary 26. For the general binary case, unsigned char addition with wrapping modulo 1<<CHAR_BIT (usually 256) would also work fine, although doing 8 bytes in a uint64_t in parallel would require SIMD or SWAR to avoid carry-out from one byte affecting its neighbours.

In portable ISO C, use memcpy(&tmp, ptr, sizeof(tmp)) to do an aliasing-safe unaligned load of a uint64_t tmp from char[] data. (See also Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?). Modern compilers will compile that to a single load instruction, at least when targeting ISAs which don't require alignment for loads. Could be much worse when targeting MIPS or maybe RISC-V.

Especially once you use something efficient like a trivial XOR constant, Ira Baxter's comment is very true that this will have negligible effect on performance since most programs don't spend a lot of their time reading string literal data. And if you do, decrypt once and keep the strings in memory.

Avoid readable text in binaries or disassembled code

2 Answers2