0

I'm doing a semactic analyzer and I need to know when there is a function in the code. I know that a function begins with an id and later there is a '(' So, in my array of elements I have this:

['id', '(', ')', '{', 'id', '(', 'lit-str', ')', ';', 'id', '(', 'lit-str', ')', ';', 'id', '(', '!', 'lit-int', ')', ';', 'id', '(', ')', ';', '}']

All 'id' followed by a '(' are functions. So, I need to find all this ocurrences. Is there some method to find all these 'id' and '(' in order to count them?

Not all the codes are exactly the same, some are bigger.

I alredy tried to do this with and if

(if 'id' + '(' in array: print(count))

But this only count the first occurrence.

bkyada
  • 332
  • 1
  • 9
  • 3
    This work is already been done - see https://docs.python.org/3/library/ast.html or https://www.youtube.com/watch?v=esZLCuWs_2Y&feature=youtu.be for how to parse Python files with Python standard library libraries – Ben May 15 '19 at 18:45
  • This is an ideal use case for [regular expression](https://docs.python.org/3/howto/regex.html) – Green Cloak Guy May 15 '19 at 18:54

2 Answers2

0

You can simply iterate over your list. A more complex solution is to use regular expression.

As the problem is asked, the for loop looks the simplest solution. You need to iterate with 2 elements at the same time.

Here, one simple solution returning a list of index where an element id is followed by an element starting by (:

# Your input data
input_list = ['id', '(', ')', '{', 'id', '(', 'lit-str', ')', ';', 'id', '(', 'lit-str', ')',
              ';', 'id', '(', '!', 'lit-int', ')', ';', 'id', '(', ')', ';', '}']


def getFunction(input_list):
    # List that will collect the index of 'id' followed by '('
    index_list = []
    print(input_list)
    for i, mot in enumerate(input_list[:-1]):
        # Also check if input_list[i+1] is defined
        if mot == 'id' and input_list[i+1] and input_list[i+1][0] == '(':
            index_list.append(i)

    return index_list

print(getFunction(input_list))
# [0, 4, 9, 14, 20]

Hope that help !

Alexandre B.
  • 5,387
  • 2
  • 17
  • 40
0

If I understand the problem as you phrase it, I would use a zip of the list with itself like this: https://stackoverflow.com/a/21303286/2860127

num_functions = 0
for left_token, right_token in zip(tokens, tokens[1:]):
    if left_token == "id" and right_token == "(":
        num_functions += 1
print("I found {} function calls/definitions.".format(num_functions))

Though it depends on how you're implementing the semantic analyzer; a more comprehensive system would need to use indexing from the current token like what Alexandre B. has.

I disagree with Alexandre B. and Green Cloak Guy - regular expressions are appropriate for the parser, the step of the compiler before semantic analysis, which figures out what the "words" are in the input (e.g., converting a specific name "foo" to "id").
Semantic analysis, on the other hand, needs to make sure the input conforms to the language's grammar, which is a Context Free Grammar, so we need something stronger than regex. Recursion might be a good way to do this.

Elliot Way
  • 243
  • 1
  • 9