3

I am trying to count the keywords in a .py file but the code I wrote is also counting keywords which occur in strings. How can I differentiate between actual keywords and the ones in strings? For example: is, with and in are keywords, but you can also spot those in comments and user input strings. This is what I have tried:

from collections import Counter
import keyword

count = {}
scode = input("Enter the name of Python source code file name :")
with open(scode,'r') as f:
    for line in f:
         words = line.split()
         for i in words:
             if(keyword.iskeyword(i)):
                 count[i]= count.get(i,0)+1
     print(count)        
Leon
  • 2,926
  • 1
  • 25
  • 34
mini
  • 31
  • 2
  • 1
    If you want to be able to differentiate between code, string literals and comments, you will have to do actual code parsing, rather than just searching for words. Expect that to be much much more complex than your current code. – zvone Sep 02 '18 at 08:45
  • If you are serious about this you should probably have a look at the [ast](https://docs.python.org/3.7/library/ast.html#module-ast) module. I imagine you will hit a number of corner-cases using regexps. – bohrax Sep 02 '18 at 08:56

1 Answers1

4

You can use ast.parse to parse the code, create a ast.NodeTransformer subclass to clear all the string nodes (no need to clear comments because comments are automatically ignored by ast.parse already), install the astunparse package to turn the node back to source code, and then count the keywords:

import ast
import astunparse
import keyword
import re

class clear_strings(ast.NodeTransformer):
    def visit_Str(self, node):
        node.s = ''
        return node

n = ast.parse('''
a = 'True'
assert False
# [[] for _ in range(9)]
"""if"""
''')

clear_strings().visit(n)
print(sum(map(keyword.iskeyword, re.findall(r'\w+', astunparse.unparse(n)))))

This outputs: 2 (because only assert and False are counted as keywords)

blhsing
  • 91,368
  • 6
  • 71
  • 106