0

I am trying to write a python script to extract starting line numbers of function definitions of a C program. I used a C parsing library in python, called "Pyclibrary", and using it to extract function names from my C file. I am then putting these names in a list, iterating through it, and searching line numbers where they are found, and deleting duplicates, by storing only the first instance of the search. But it fails for those cases where the first instance is not the function definition. I need to refine my logic for the same. Any leads would be appreciated.

Here is my code:

from pyclibrary import CParser
from pyclibrary import CLibrary
import pandas as pd

parser = CParser(['path/to/c/file/sample.c'])


my_list = []
list_of_func = []
d1 = []
d2 = []
d3 = []
func1 = parser.defs['functions']
inside_function = 0
left_brack_num = 0

for i in func1:
    my_list.append(str(i))

with open('path/to/c/file/sample.c') as myFile:
    for num, line in enumerate(myFile, 1):
        for i in range(len(my_list)):
            if my_list[i] in line:
                list_of_func.append([my_list[i], num]) 
                d1.append(my_list[i])
                d2.append(num)
                inside_function = 1

            if inside_function == 1:
                left_brack_num += line.count("{")
                if "}" in line:
                    left_brack_num -= line.count("}")
                    if left_brack_num == 0:
                        d3.append(num)
                        inside_function = 0

Data ={'Function Name': d1, 'Starting Line number': d2}
df2d = pd.DataFrame(Data)
df2d.drop_duplicates(subset = 'Function Name', 
                     keep = 'first', inplace = True) 
snd = pd.Series(list_of_func)

print(df2d) 
Aastha Maingi
  • 13
  • 1
  • 5
  • Doing the job thoroughly requires a lot of hard work. The definition can be hidden in a macro. It could be inside a comment, in which case it probably shouldn’t be counted. Equally, if it doesn’t have to deal with all possibilities, it is manageable. – Jonathan Leffler Oct 11 '19 at 13:53

1 Answers1

1

Parsing manually a c file is generally a bad idea, there is a lot of corner cases and you will end up reinventing the wheel.

If you can compile you file with debug symbols you can find your symbols easily with :

nm -l ./foo --defined-only| grep :

Where:

  • nm Lists the symbols defined in the the binary
  • -l Writes the file and line number where the symbol is defined
  • The grep keeps only the user-defined symbols.

For instance if i try with this file:

int a;
int f1(){}
int f2(){}
int main(){}

Compiled with gcc -o foo foo.c -g, I get the following symbols:

000000000000402c B a    /home/user/foo.c:1
0000000000001125 T f1   /home/user/foo.c:2
000000000000112c T f2   /home/user/foo.c:3
0000000000001133 T main /home/user/foo.c:4

Please note that I get both the function and the global variables. If you want only the functions you can filter them using the 2nd field and keep only these with T value


If you really want to start from your C file, you might want to use cscope (see this post).

Maxime B.
  • 1,116
  • 8
  • 21