1

I working on a huge code base written many years ago. We're trying to implement multi-threading and I'm incharge of cleaning up global variables (sigh!)

My strategy is to move all global variables to a class, and then individual threads will use instances of that class and the globals will be accessed through class instance and -> operator.

In first go, I've compiled a list of global variables using nm by finding B and D group object names. The list is not complete, and incase of static variables, I don't get file and line number info.

The second stage is even more messy, I've to replace all globals in the code base with classinstance->global_name pattern. I'm using cscope Change text string for this. The problem is that in case of some globals, their name is also being used locally inside functions, and thus cscope is replacing them as well.

Any other way to go about it? Any strategies, or help please!

Kartik Anand
  • 4,513
  • 5
  • 41
  • 72
  • no access to clang..our company has a very complex build system..even hacking around `gcc` is difficult – Kartik Anand Jun 26 '15 at 06:12
  • Which IDE's do you have available? I remember e.g. Eclipse CDT having a bunch of refactorings, not sure if any of them fits your purpose, though. – FourtyTwo Jun 26 '15 at 06:14
  • I normally use Vim. But I do have Eclipse CDT on my machine. – Kartik Anand Jun 26 '15 at 06:15
  • 6
    You task doesn't make much sense. If you have a huge code base full of global variables, you would rather have to go through each individual file that uses them and look at the fundamental program design. Does it make sense? Can it be easily fixed or must it be rewritten? Just dogmatically stuffing all global variables into some big, artificial global class will not improve the program design. – Lundin Jun 26 '15 at 06:17
  • That is the first part of the project. That the product just runs with no race conditions. After that we would be separating them. The code base is so huge with so many imperfections that it may be impossible to do it file by file by just one person. – Kartik Anand Jun 26 '15 at 06:18
  • You shall google for C/C++ cross-referencing tools. I.e. tools able to find all the references of each variable of a project. Be aware that cross referencing of a huge C/C++ project which may be compiled with different initial macro settings may be difficult. – Marian Jun 26 '15 at 07:23
  • I think what Lundin means is that once all those variables are stuffed into a single object, then you will still have all the old race conditions. On the other hand, if you try to make one instance of the new class per thread, you will break the program because the original design assumed that the globals were, well, global. You "first step" thus achieves nothing. – Adrian Ratnapala Jun 26 '15 at 09:30
  • The things is globals were only put because programmers didn't want to pass variables around. They were not actually "globals" – Kartik Anand Jun 26 '15 at 09:32
  • Well, that's even worse then. What do you expect to happen when several threads write to a structure used for parameter passing, and read from it concurrently? Best thing to do would be to kick that code into the trash can, and start from scratch, to be honest. – Damon Jun 26 '15 at 09:38
  • I would love to do that :P I only work there, and the code base is 11 years old. No coding practices whatsoever. – Kartik Anand Jun 26 '15 at 09:38
  • 1
    @Damon, I think he means that there will be one structure per thread and will be used for passing stuff around inside the thread. But in that case, if they are using GCC, then as a first step they could just declare all their offending variables as `__thread`. After that they can incrementally get rid of all those variables. Especially the ones which become performance bottlenecks. – Adrian Ratnapala Jun 26 '15 at 10:26
  • That's what we have done `__thread` – Kartik Anand Jun 26 '15 at 10:27
  • Does putting `__thread` have any performance bottlenecks. – Kartik Anand Jun 26 '15 at 10:28
  • 1
    I assume that looking up a thread-local variable is slower than looking up a field of a struct that you already have pointer to in your registers. I don't know if this is a big deal or not. But if you have already done the `__thread` quick-fix, then I don't see the point of your monster-object hack. Better to just remove the globals one by one. – Adrian Ratnapala Jun 26 '15 at 10:30
  • `__thread` has noticeable overhead under Windows (both with MSVC and MinGW) since it is either implemented with a Win32 API call or by getting a pointer to the TLS table at `FS:[0x2C]` (adding two indirections overall, per access). I remember reading some time ago that recent Linux has a close to zero overhead TLS. However I do not know the implementation details. Note that `__thread` won't run constructors/destructors, for that you need `thread_local` which is yet another beast. – Damon Jun 26 '15 at 11:30

2 Answers2

1

Halfway there: if a function uses a local name that hides the global name, the object file won't have an undefined symbol. nm can show you those undefined symbols, and then you know in which files you must replace at least some instances of that name.

However, you still have a problem in the rare cases that a file uses both the global name and in another function hides the global name. I'm not sure if this can be resolved with --ffunction-sections; but I think so: nm can show the section and thus you'll see the undefined symbols used in foo() appear in section .text.foo.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • Thanks for the answer!. How can i make `nm` show the sections? – Kartik Anand Jun 26 '15 at 09:05
  • Hmm, seems I have some problems getting that out of nm as well. I'm starting to understand why MSVC++ is more efficient in eliminating unused code: apparently --function-sections will put functions in separate sections but it won't put the associated symbols in multiple symbol tables. That means that when ` foo()` uses `bar()`, `bar()` will be dragged in even if `foo()` isn't used because the dependecy exists at file level. I'm not impressed. – MSalters Jun 26 '15 at 12:29
1

just some suggestions, from my experience:

  • use eclipse: the C++ indexer is very good, and when dealing with a large project I find it very useful to track variables. shift+ctrl+g (I have forgotten how to access to it from menus!) let you search all the references, ctrl+alt+h (open call hierarchy) the caller-callee trees...

  • use eclipse: it has good refactoring tools, that is able to rename a variable without touching same-name-different-scope variables. (it often fails in case there are templates involved. I find it good, better than visual studio 2008 counterpart).

  • use eclipse: I know, it get some time to get started with it, but after you get it, it's very powerful. It can deal easily with the existing makefile based project (file -> new -> project -> makefile project with existing code).

  • I would consider not to use class members, but accessors: it's possibile that some of them will be shared among threads, and need some locking in order to be properly used. So I would prefer: classinstance->get_global_name()

As a final note, I don't know whether using the eclipse indexer at command-line would be helpful for your task. You can find some examples googling for it.

This question/answer can give you some more hints: any C/C++ refactoring tool based on libclang? (even simplest "toy example" ). In particular I do quote "...C++ is a bitch of a language to transform"

Community
  • 1
  • 1
Sigi
  • 4,826
  • 1
  • 19
  • 23
  • I have to change about ~2000 global variables. Command line is the only way I can do that :/ – Kartik Anand Jun 26 '15 at 09:40
  • I find it difficult that you will be able to do it in a totally automated way, my quote is not random :) In fact eclipse indexer is good and it still fails. However you can go with the automated strategy that you have established and then work by hand with the rest. – Sigi Jun 26 '15 at 09:48