Code metric required. Ratio of LOCs in h-files to LOCs in cpp-files in optimal code

Question

Could I estimate, what would be the number of C++ LOCs in optimal code (desktop-application) given the number of LOCs in h-files?

The background: I'm doing an effort-estimation and a plan for porting a C++ software to C#.

My first idea was to create a rough estimation based on LOCs and to track the process using LOCs ported to LOCs remaining. Assuming, that the porting speed will be 200LOCs/day I came to 1,5 person-years. If I present this figure to the customer, I certainly won't get the contract.

After a closer look to the code I found out, that the code is very inefficient, uses many-many C&P code, implements own container-classes, etc. So the LOC-Number of C++ seems not to reflect the effort for implementing the same functionality. Now my assumption is, a header-file should reflect the functionality better.

LOC always was the worst metrics to rely on. There are better heuristics in static code analysis. — πάντα ῥεῖ, Aug 09 '15 at 14:52
Without a context and in not cleared code - yes you can't rely on LOCs. But cleared LOCs (removed comments, unused, redundant code, having code implemented in accordance to style guide, etc.) and knowing the context (Progr. Language, application type, industry) the LOCs are still good metric. — Valentin H, Aug 09 '15 at 15:16
The problem with "uses its own container classes" is that they're generally buggy. Not necessarily, but the type of programmer that's smart enough to write them correctly is also smart enough to _not_ write them in the first place. That means you'll spend a disproportional amount of time dealing with those lines of code, either rewrting the uses of that container or mimicking its broken behavior. — MSalters, Aug 10 '15 at 15:15

zneak · Answer 1 · 2015-08-09T15:01:46.090

2

No. The size of a header file is a really bad proxy for the size of the associated code file. A header only shows entry points to an API, and it can hide as much or as little things as the API requires.

In other words, a header that declares a single function only says that there's a single public function in that implementation file. The implementation file could have only one function in it, or it could have hundreds. Neither of them is better, there's nothing wrong with either development approach. It just means that you can't use headers to estimate effort.

With a 100k SLOCs program, it would be a stretch to use SLOCs as a measure, because you'll spend more time testing than developing. If you have access to the application's features documentation, consider using function points instead. From what I hear, they're one of the less broken heuristics around.

As far as development goes, don't forget that you can call to C++ code from C# and that C++/CX can integrate C#. This can ease some porting pain if you can just incrementally rewrite more or less independent components.

edited Aug 09 '15 at 15:01

answered Aug 09 '15 at 14:55

zneak

134,922
42
253
328

The header file should reflect the data-model. The code in CPP-Files implements the workflows. There are many recommendations existing e.g. optimal number of locs per function, etc. So having a data-model it should be feasible to estimate the optimal size of the software. – Valentin H Aug 09 '15 at 15:06
@ValentinHeinitz It doesn't reflect the required implementation complexity though, which makes it a bad measure. – πάντα ῥεῖ Aug 09 '15 at 15:17
@ValentinHeinitz, there are estimates for the suggested size of a function, but there is no limit to this function's call graph, and there is no limit to the number of functions that a header does not export. You can have 3 functions in the header, but if you have 45 functions (with 42 `static`) in the .cpp, you're getting a completely wrong picture, even if they all have the "recommended number of lines". – zneak Aug 09 '15 at 15:21
@zneak Oh, I see. The code is structured in a very classic way - All declarations are in h-files all impleentation in cpp. Even if the code used a pimpl idiom (which is not the case) I do have access to all h-files, either private or public. So there are 1400 functions 1500 member variables and 400 enums. – Valentin H Aug 09 '15 at 15:36

score 1 · Answer 2 · answered Aug 09 '15 at 17:22

The header file may not be an indicator.

Header files usually contain function declarations -- the interface or instructions on how to call a function.

Functions in source files can be zero statements or hundreds of LOC. One cannot tell the number of statements or lines in a function by looking at a function declaration.

Many LOC counters include both header files and source files.

Neel Basu · Accepted Answer · 2016-02-09T07:01:11.303

Not with the same objective, however for my curiosity I once checked my LOCs with cloc for a project in its intermediate (pre alpha) stage. It was not well documented and some of its places were slightly dirty coded or not well planned.

C++                             100           2545           3252          11680
C/C++ Header                    108           2847          12721           9077
C                                 4           1080            971           6298
CMake                            33            241            161           1037
Bourne Shell                      4             16              0            709
Python                            8             90             72            423
CSS                               1             63             21            422
PHP                               5             23             21            295
Javascript                        5             42             23            265
JSON                              4              0              0            183
XML                               1             11            171             72
make                              1             13              0             15
Bourne Again Shell                2             10              0             14

As you can see the ratio between header LOC and source LOC is 0.777. However average is not a good metric for anything. But along with other metrics e.g. comment lines some fuzzy lines may be drawn to indicate different parameters and stages of development. More studies of well known code bases are required to come up with a good huristics.

But at the end whatever measures you take, it can conclude an assumption which may be wrong.

Thanks for your insights! Looks like you are the only one who read the question till the end :-) Of cause an average is rarely accurate. I prefere the approach of Steve McConnell how he gives the code-metrics in his book "Software Estimation". There are always tables, not single values. Depending on project art (desktop, embedded, web, automotive, commercial, etc.) of software and on the size there are always a different value for a particular metric. — Valentin H, Feb 10 '16 at 09:24
It will be a good research work if you can study some open source projects and their metrics and evolution and come up with some publication. It will be an interesting paper. — Neel Basu, Feb 12 '16 at 14:22

Code metric required. Ratio of LOCs in h-files to LOCs in cpp-files in optimal code

3 Answers3