2

I have many different .h files in different formats. They include function definitions, variable definitions and more. My goal is to extract all function names and their respective comments and map them. I am looking for a working approach to accomplish this using Python. I tried the following approaches:

pycparser

https://github.com/eliben/pycparser

I read the following blog article regarding my problem, but I couldn't manage to get it working: https://eli.thegreenplace.net/2015/on-parsing-c-type-declarations-and-fake-headers

Using the code below I get the following error:

pycparser.plyparser.ParseError: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/i386/_types.h:100:27: before: __darwin_va_list

import sys
from pycparser import parse_file

sys.path.extend(['.', '..'])

if __name__ == "__main__":
    filename = "<pathToHeader>/file.h"

    ast = parse_file(filename, use_cpp=True,
            cpp_path='gcc',
            cpp_args=['-E', r'<pathToPycparser>/utils/fake_libc_include'])
    ast.show()

pyclibrary

https://pyclibrary.readthedocs.io/en/latest/

This is actually running without an error, but it does not find any functions in the example file.

from pyclibrary import CParser
parser = CParser(["<pathToHeader>/file.h")])
print(parser)

pygccxml

https://pygccxml.readthedocs.io/en/master/index.html

Running the code below, I get numerous errors, e.g. unknown type name 'NSString and the script stops after 20 errors because too many errors were emitted.

from pygccxml import utils
from pygccxml import declarations
from pygccxml import parser

generator_path, generator_name = utils.find_xml_generator()

xml_generator_config = parser.xml_generator_configuration_t(
    xml_generator_path=generator_path,
    xml_generator=generator_name)

filename = "<pathToHeader>/file.h"

decls = parser.parse([filename], xml_generator_config)

global_namespace = declarations.get_global_namespace(decls)

ns = global_namespace.namespace("ns")

Doxygen

https://www.doxygen.nl/index.html

Even though Doxygen is not a parser itself but more a code-documentation tool I had the best results with it. It creates an xml output with all function names (even correctly parsed) and also the comments. The comments are tagged as comments and even noted with their line number. However, the functions are not noted with the line number, so matching the correct comment to the correct function is basically impossible.

Custom Parser

I created a simple parser myself, which is working somehow, but I cannot be sure that I covered all possible C-Syntax used within these files since there is different syntax used across the files and I cannot check all the files manually for their syntax because there are way too many.

functionComments = {}
inComment = False
comment = []
function = None

for i in range(0, len(lines)):

    line = lines[i].strip()

    if line.startswith("/*"):
        inComment = True

    elif line.endswith("*/"):
        inComment = False
        comment.append(line[:-2].strip())

        if len(lines) > i+1 and lines[i+1].startswith("- ("):
            functionName = lines[i+1]
            counter = 2

            while len(lines) > i+counter and lines[i+counter] != "\n":
                functionName += " " + lines[i+counter].lstrip().split(" API_")[0]
                counter += 1

            if ":" in functionName:
                functionNameParts = functionName.split(":")
                functionNamePartsSplitBySpace = []
                function = ""

                for j in range(0, len(functionNameParts)-1):
                    functionNamePartsSplitBySpace.append(functionNameParts[j].split(" "))

                for k in range(0, len(functionNamePartsSplitBySpace)):
                    function += functionNamePartsSplitBySpace[k][-1].split(")")[-1] + ":"

            else:
                function = lines[i+1].split(" NS_AVAILABLE")[0].split(")")[-1]

            functionComments[function] = "\n".join(comment)
            comment = []
            function = None

        else:
            function = None
            comment = []

    elif inComment:

        if line.startswith("* "):
            comment.append(line[2:].strip())

        else:
            comment.append(line)

Example Header File

/*
 *  CLLocationManagerDelegate.h
 *  CoreLocation
 *
 *  Copyright (c) 2008-2010 Apple Inc. All rights reserved.
 *
 */

#import <Availability.h>
#import <Foundation/Foundation.h>
#import <CoreLocation/CLLocationManager.h>
#import <CoreLocation/CLRegion.h>
#import <CoreLocation/CLVisit.h>

NS_ASSUME_NONNULL_BEGIN

@class CLLocation;
@class CLHeading;
@class CLBeacon;
@class CLVisit;

/*
 *  CLLocationManagerDelegate
 *  
 *  Discussion:
 *    Delegate for CLLocationManager.
 */
@protocol CLLocationManagerDelegate<NSObject>

@optional

/*
 *  locationManager:didUpdateToLocation:fromLocation:
 *  
 *  Discussion:
 *    Invoked when a new location is available. oldLocation may be nil if there is no previous location
 *    available.
 *
 *    This method is deprecated. If locationManager:didUpdateLocations: is
 *    implemented, this method will not be called.
 */
- (void)locationManager:(CLLocationManager *)manager
    didUpdateToLocation:(CLLocation *)newLocation
           fromLocation:(CLLocation *)oldLocation API_AVAILABLE(macos(10.6)) API_DEPRECATED("Implement -locationManager:didUpdateLocations: instead", ios(2.0, 6.0)) API_UNAVAILABLE(watchos, tvos);

/*
 *  locationManager:didUpdateLocations:
 *
 *  Discussion:
 *    Invoked when new locations are available.  Required for delivery of
 *    deferred locations.  If implemented, updates will
 *    not be delivered to locationManager:didUpdateToLocation:fromLocation:
 *
 *    locations is an array of CLLocation objects in chronological order.
 */
- (void)locationManager:(CLLocationManager *)manager
     didUpdateLocations:(NSArray<CLLocation *> *)locations API_AVAILABLE(ios(6.0), macos(10.9));

/*
 *  locationManager:didUpdateHeading:
 *  
 *  Discussion:
 *    Invoked when a new heading is available.
 */
- (void)locationManager:(CLLocationManager *)manager
       didUpdateHeading:(CLHeading *)newHeading API_AVAILABLE(ios(3.0), watchos(2.0)) API_UNAVAILABLE(tvos, macos);

/*
 *  locationManagerShouldDisplayHeadingCalibration:
 *
 *  Discussion:
 *    Invoked when a new heading is available. Return YES to display heading calibration info. The display 
 *    will remain until heading is calibrated, unless dismissed early via dismissHeadingCalibrationDisplay.
 */
- (BOOL)locationManagerShouldDisplayHeadingCalibration:(CLLocationManager *)manager  API_AVAILABLE(ios(3.0), watchos(2.0)) API_UNAVAILABLE(tvos, macos);

/*
 *  locationManager:didDetermineState:forRegion:
 *
 *  Discussion:
 *    Invoked when there's a state transition for a monitored region or in response to a request for state via a
 *    a call to requestStateForRegion:.
 */
- (void)locationManager:(CLLocationManager *)manager
    didDetermineState:(CLRegionState)state forRegion:(CLRegion *)region API_AVAILABLE(ios(7.0), macos(10.10)) API_UNAVAILABLE(watchos, tvos);

/*
 *  locationManager:didRangeBeacons:inRegion:
 *
 *  Discussion:
 *    Invoked when a new set of beacons are available in the specified region.
 *    beacons is an array of CLBeacon objects.
 *    If beacons is empty, it may be assumed no beacons that match the specified region are nearby.
 *    Similarly if a specific beacon no longer appears in beacons, it may be assumed the beacon is no longer received
 *    by the device.
 */
- (void)locationManager:(CLLocationManager *)manager
        didRangeBeacons:(NSArray<CLBeacon *> *)beacons
               inRegion:(CLBeaconRegion *)region API_DEPRECATED_WITH_REPLACEMENT("Use locationManager:didRangeBeacons:satisfyingConstraint:", ios(7.0, 13.0)) API_UNAVAILABLE(macos, macCatalyst) API_UNAVAILABLE(watchos, tvos);

/*
 *  locationManager:rangingBeaconsDidFailForRegion:withError:
 *
 *  Discussion:
 *    Invoked when an error has occurred ranging beacons in a region. Error types are defined in "CLError.h".
 */
- (void)locationManager:(CLLocationManager *)manager
rangingBeaconsDidFailForRegion:(CLBeaconRegion *)region
              withError:(NSError *)error API_DEPRECATED_WITH_REPLACEMENT("Use locationManager:didFailRangingBeaconsForConstraint:error:", ios(7.0, 13.0)) API_UNAVAILABLE(macos, macCatalyst) API_UNAVAILABLE(watchos, tvos);

- (void)locationManager:(CLLocationManager *)manager
        didRangeBeacons:(NSArray<CLBeacon *> *)beacons
   satisfyingConstraint:(CLBeaconIdentityConstraint *)beaconConstraint API_AVAILABLE(ios(13.0)) API_UNAVAILABLE(watchos, tvos, macos);

- (void)locationManager:(CLLocationManager *)manager
didFailRangingBeaconsForConstraint:(CLBeaconIdentityConstraint *)beaconConstraint
                  error:(NSError *)error API_AVAILABLE(ios(13.0)) API_UNAVAILABLE(watchos, tvos, macos);

/*
 *  locationManager:didEnterRegion:
 *
 *  Discussion:
 *    Invoked when the user enters a monitored region.  This callback will be invoked for every allocated
 *    CLLocationManager instance with a non-nil delegate that implements this method.
 */
- (void)locationManager:(CLLocationManager *)manager
    didEnterRegion:(CLRegion *)region API_AVAILABLE(ios(4.0), macos(10.8)) API_UNAVAILABLE(watchos, tvos);

/*
 *  locationManager:didExitRegion:
 *
 *  Discussion:
 *    Invoked when the user exits a monitored region.  This callback will be invoked for every allocated
 *    CLLocationManager instance with a non-nil delegate that implements this method.
 */
- (void)locationManager:(CLLocationManager *)manager
    didExitRegion:(CLRegion *)region API_AVAILABLE(ios(4.0), macos(10.8)) API_UNAVAILABLE(watchos, tvos);

/*
 *  locationManager:didFailWithError:
 *  
 *  Discussion:
 *    Invoked when an error has occurred. Error types are defined in "CLError.h".
 */
- (void)locationManager:(CLLocationManager *)manager
    didFailWithError:(NSError *)error;

/*
 *  locationManager:monitoringDidFailForRegion:withError:
 *  
 *  Discussion:
 *    Invoked when a region monitoring error has occurred. Error types are defined in "CLError.h".
 */
- (void)locationManager:(CLLocationManager *)manager
    monitoringDidFailForRegion:(nullable CLRegion *)region
    withError:(NSError *)error API_AVAILABLE(ios(4.0), macos(10.8)) API_UNAVAILABLE(watchos, tvos);

/*
 *  locationManager:didChangeAuthorizationStatus:
 *  
 *  Discussion:
 *    Invoked when the authorization status changes for this application.
 */
- (void)locationManager:(CLLocationManager *)manager didChangeAuthorizationStatus:(CLAuthorizationStatus)status API_AVAILABLE(ios(4.2), macos(10.7));

/*
 *  locationManager:didStartMonitoringForRegion:
 *  
 *  Discussion:
 *    Invoked when a monitoring for a region started successfully.
 */
- (void)locationManager:(CLLocationManager *)manager
    didStartMonitoringForRegion:(CLRegion *)region API_AVAILABLE(ios(5.0), macos(10.8)) API_UNAVAILABLE(watchos, tvos);

/*
 *  Discussion:
 *    Invoked when location updates are automatically paused.
 */
- (void)locationManagerDidPauseLocationUpdates:(CLLocationManager *)manager API_AVAILABLE(ios(6.0)) API_UNAVAILABLE(watchos, tvos, macos);

/*
 *  Discussion:
 *    Invoked when location updates are automatically resumed.
 *
 *    In the event that your application is terminated while suspended, you will
 *    not receive this notification.
 */
- (void)locationManagerDidResumeLocationUpdates:(CLLocationManager *)manager API_AVAILABLE(ios(6.0)) API_UNAVAILABLE(watchos, tvos, macos);

/*
 *  locationManager:didFinishDeferredUpdatesWithError:
 *
 *  Discussion:
 *    Invoked when deferred updates will no longer be delivered. Stopping
 *    location, disallowing deferred updates, and meeting a specified criterion
 *    are all possible reasons for finishing deferred updates.
 *
 *    An error will be returned if deferred updates end before the specified
 *    criteria are met (see CLError), otherwise error will be nil.
 */
- (void)locationManager:(CLLocationManager *)manager
    didFinishDeferredUpdatesWithError:(nullable NSError *)error API_AVAILABLE(ios(6.0), macos(10.9)) API_UNAVAILABLE(watchos, tvos);

/*
 *  locationManager:didVisit:
 *
 *  Discussion:
 *    Invoked when the CLLocationManager determines that the device has visited
 *    a location, if visit monitoring is currently started (possibly from a
 *    prior launch).
 */
- (void)locationManager:(CLLocationManager *)manager didVisit:(CLVisit *)visit API_AVAILABLE(ios(8.0)) API_UNAVAILABLE(watchos, tvos, macos);

@end

NS_ASSUME_NONNULL_END
  • 1
    You mentioned about doxygen that you had good results but: "However, the functions are not noted with the line number, so matching the correct comment to the correct function is basically impossible.". It might be that you should not map by line number but by the `id`. 2 questions: which version of doxygen did you use? Can you give a small example (the `.h` code that shows the problem when run with doxygen? – albert Feb 18 '23 at 17:41
  • You should consider adding [`ctags`](https://linux.die.net/man/1/ctags) to your repertoire. If it doesn't get you all the information you want by itself, then very likely you can combine what it gives you with what Doxygen gives you to arrive where you want to be. – John Bollinger Feb 18 '23 at 17:50
  • @albert 1. Mapping by id would be even easier, but the id of a function is not related to the comments. At least I could not find any other use of a function id except at the definition of the function itself. 2. I use doxygen 1.9.6 and I don't really have a "problem" whith doxygen but I simply cannot map the comments found to the functions found – Kühlhausvogel Feb 18 '23 at 17:56
  • @JohnBollinger I looked at ctags but if I understood correctly, it does not really help in my situation, because the "tags" I would add with ctags is exactly what is missing to map functions and comments. And I cannot add these tags manually because of this exact reason. I hope this was understandable. If I misunderstood ctags, I would kindly ask you to explain it in more detail. – Kühlhausvogel Feb 18 '23 at 18:03
  • @albert I added one of the many header files I want to parse – Kühlhausvogel Feb 18 '23 at 18:06
  • 1
    Consider `castxml`. It uses the libraries underlying `clang` to parse a file and produce an AST in XML format. It will give line numbers for functions. So, you could combine this with `doxygen` output. – Craig Estey Feb 18 '23 at 18:18
  • @CraigEstey If i remember correctly, Pycparser also uses clang and/or castxml, but Iwill have a look at it, thank you! – Kühlhausvogel Feb 18 '23 at 18:21
  • For my source, I always feed it through an indenter [I use GNU `indent`]. My style is that a function definition should be in column 1 (with the return type on the preceding line). So, `// foo -- i am the foo function\nint\nfoo(void)\n{\n...}\n`. Then, I have a custom `perl` script that is able to parse that syntax (i.e. looking in column 1). So, because all my source is "well behaved", the line numbers will always match up and the syntax is easy to extract. – Craig Estey Feb 18 '23 at 18:28
  • @Kühlhausvogel. ctags-type tag files do seem to contain just the kind of information you need. And that is `ctags`' *output*, not its input, so I'm confused about why you think it wouldn't be helpful. – John Bollinger Feb 18 '23 at 18:40
  • @CraigEstey Could you provide an example command for castxml to create an AST for a given header? I was not able to find information about that, except for the command ```castxml ``` which throws an error in line 9 of my header file: ```fatal error: 'Availability.h' file not found``` – Kühlhausvogel Feb 18 '23 at 20:04
  • @JohnBollinger seems like I misunderstood the usage of ctags. I would highly appreciate it, if you could give an example command to extract the functions of a header file. Since in the ctags syntax there are so many options, I cannot quite get my head around what options to use. The commands I tried so far did not work out as I expected. – Kühlhausvogel Feb 18 '23 at 20:30
  • @JohnBollinger e.g. with ctags file.h I just get an empty tags file – Kühlhausvogel Feb 18 '23 at 20:47
  • 1
    On my [fedora] install, there is a man page. Also, `castxml --help` gives a list of the `clang` options. I _think_ you just need to use `-I` as you would if compiling to allow the compiler to find the requisite `.h` files. As to `castxml` specific options, I used (e.g.) `castxml --castxml-output=1 -o output foo.c` As to your example, I commented out the `import` statements. But, when I did `castxml --castxml-output=1 -x objective-c -o output example.h`, I got: `error: '--castxml-output=' does not work with Objective C`. So, you may be out of luck – Craig Estey Feb 18 '23 at 21:28
  • 1
    Regarding the doxygen part. There are no doxygen comments in the given code (the doxygen comment blocks should start with e.g. `/**``), so doxygen doesn't / cannot map anything to the function. – albert Feb 19 '23 at 08:49
  • @albert Yes, there are no doxygen comment blocks, but there are comment blocks, which also get recognized as comments by doxygen, even with line number. The problem is, these header files are given and not manually created by me. Do you think there is another way than to crawl all header files I have and change the comment blocks to begin with '/**'? – Kühlhausvogel Feb 19 '23 at 19:48
  • 1
    I see your concern, probably doxygen can give a bit of help here as well by means of the setting `INPUT_FILTER` and set it to something like `INPUT_FILTER = "sed -e \"s/\/\*/&*/\""` (or maybe set `FILTER_PATTERNS`) – albert Feb 20 '23 at 08:59
  • @albert Thank you for your engagement to try and help me! Meanwhile I somehow fixed it, even if its a kind of messy workaround. – Kühlhausvogel Feb 20 '23 at 18:21

1 Answers1

0

Following Craig Esteys proposal to use castxml which uses clang libraries, I was able to extract the function names together with their line number by using the following command (suggested here).

Command Line

clang -cc1 -ast-dump -fblocks -x objective-c <pathToHeader>/file.h

Applying this command to the example header throws an error fatal error: 'Availability.h' file not found. Nevertheless, the AST is created successfully (as far as I can tell).

Python

findComment() is a custom method to parse the Doxygen .xml and extract the comments.

import clang.cindex

def findFunction(node):
    global functions

    try:
        nodeKind = clang.cindex.CursorKind.OBJC_INSTANCE_METHOD_DECL

        if node.kind == nodeKind:
            comment = findComment(node.location.file.name, node.location.line)
            functions[node.displayname] = {"file": node.location.file.name, "line": node.location.line, "comment": comment}

        for child in node.get_children():
            findFunction(child)

    except Exception as exception:
        print("Error for node\n{}\n{}".format(node.location, exception))

        for child in node.get_children():
            findFunction(child)

if __name__ == "__main__":

    functions = {}
    index = clang.cindex.Index.create()
    filePath = 'pathToFile/file.h'
    tu = index.parse(filePath, ["-cc1", "-ast-dump", "-fblocks", "-x", "objective-c"])
    findFunction(tu.cursor)