Op wants to do mass renaming, e.g., generate a list of names, and then rename many of them across a big source code base.
A refactoring tool that was good at this is a choice, if he can find one.
A strange but perhaps effective alternative: a C++ code source code obfuscation tool.
Our company offers one of these that does the following (yes, this will seem wrong for the task!):
- strips comments
- damages formatting
- replaces identifiers consistently with scrambled names (seed of the answer!)
- builds a identifier map (list of "identifier -> scrambled_identifier" names) as a result
for all identifiers.
This process is applied to files without preprocessing.
So, in effect, it is a mass renaming tool. And renaming to bad names is its purpose, but
it can be abused to rename to good names.
In fact, what it accepts as an input
is an identifier map (possibly empty, certainly on the first run, usually taken
from successive obfuscation runs), and it renames identifiers it finds in that map
according to the map, and identifiers it doesn't find with new scrambled names.
If you give it a full map, you have full control over the names it renames-to.
So, to use it for mass renaming, the following process should work:
- Run the obfuscator, Get the identifier map. Throw the result
source text away.
- Revise the identifier map to be "identifier -> identifier". This is a 30 second task with a decent editor like Emacs. If one uses this revised map unchanged, the obfuscator renames every symbol to itself, e.g., nothing gets renamed. Replacing "identifier -> foo" with just "identifier" is treated as "identifier -> identifier" by the tool.
- (Sort then) review the identifier list. Choose new names for some of the identifiers. Revise the list accordingly: "bad_identifier_1 -> better_identifier_1"
- Re-run the obfuscator, using the revised map. Your bad_identifiers will get replaced.
Oops, what about comments and formatting :-?
Well, there's a command line switch that in essence says "don't throw the comments away".
As far as formatting is concerned, the obfuscator remarkably includes a source code formatter.
Just run it a second time as a formatter. Voila, renamed-code with pretty format.
Caveats:
- the formatter cannot handle some badly-placed preprocessor conditionals; most C++ code doesn't have this, and what there is of it can be usually changed with a one line edit.
- the obfuscator does not distinguish scopes. Given I -> J, it will rename all I instances to J.
- the obfuscator won't detect stupid renamings. If you rename I -> J, and rename K -> J, if that renaming damages your program the obfuscator won't tell you. (That renaming may work; depends on your code and where I and K are used). This is easily avoided: don't produce a map with the same renamed-to name anywhere. THis means you should not rename identifiers which appear in system include files; you can rename identifiers that appear in your applications include files.
If there was enough interest, minor changes on our part could preserve formatting and comments directly.
The nice thing about this klunky process is you can experiment with getting the set of renames right; you only need to keep the final "obfuscated/formatted" result. You can of course rename sets of things in groups by running this process one per stages. Highly recommend recompiles after each cycle :-}
You can use this process to rename one identifier at a time, but I think a regular editor would serve your pretty well for this.
If OP just wanted the list of names, he could obviously stop after the first obfuscation pass and run away with the identifier map.
No, it isn't a regexp-replace-string hack; it uses a full C++11 lexer so it is not confused by contents of string literals or comments. The formatter part actually uses a full C++(11) parser.