1

My objective is to identify the exact indentation of a code line in a python file. Since i would be instrumenting statements at some location, determining the required indentation of of a line is important to achieve my goal. The problem can be explained in the following example:

First Scenario
#A.py

a=0                  <----------- indentation '0' spaces or '0' \t
while a<5:           <----------- indentation '0' spaces or '0' \t
    print a          <----------- indentation '4' spaces or '1' \t
    a=a+1            <----------- indentation '4' spaces or '1' \t

Second scenario
#A.py

a=0                  <----------- indentation '0' spaces or '0' \t
while a<5:           <----------- indentation '0' spaces or '0' \t
        print a      <----------- indentation '8' spaces or '2' \t
        a=a+1        <----------- indentation '8' spaces or '2' \t

Since i am inspecting an application consisting of many files i come across files with the above scenarios. I would like to know how to determine the indentation of any line in a python file?

Kaushik
  • 1,264
  • 8
  • 20
  • 32
  • You could read the line, then use `lstrip()` to trim the leading whitespace and compare the new string size to the size that had the leading spaces. – Hunter McMillen Jul 29 '13 at 15:54
  • 3
    That way lies madness. Instead of measuring what's there, just change it all to be consistent so you don't need to guess anymore. – Mark Ransom Jul 29 '13 at 15:54
  • If you're working with any source code then you have to accept anything the parser accepts. – Joe Jul 29 '13 at 15:54
  • I think he is trying to build a linting or code analysis tool. No need for condescending comments. – Markus Unterwaditzer Jul 29 '13 at 16:01
  • @MarkusUnterwaditzer I don't think anyone's comments were condescending? Or was the comment since removed? – Joe Jul 29 '13 at 16:28
  • Almost duplicate of: https://stackoverflow.com/questions/2268532/grab-a-lines-whitespace-indention-with-python. Not a duplicate because specifically for Python, introspection is relevant: https://stackoverflow.com/questions/39172306/can-a-line-of-python-code-know-its-indentation-nesting-level – 0 _ Sep 11 '17 at 16:05

5 Answers5

4

Be aware that the method you choose to determine indentation can have a substantial impact on performance. As an example, while you can use a regular expression for the task of measuring leading whitespace, there are easier and far more efficient ways to do so.

import re

line = '            Then the result is even.'
r = re.compile(r"^ *")

%timeit len(line) - len(line.lstrip())    # 1000000 loops, best of 3: 0.387 µs per loop
%timeit len(re.findall(r"^ *", line)[0])  #  100000 loops, best of 3: 1.94 µs per loop
%timeit len(r.findall(line)[0])           # 1000000 loops, best of 3: 0.890 µs per loop

The reason regular expressions from the other answers are slower is that a regular expression is a state machine, compiled at the time the regular expression is constructed. Internally there is a cache, but even then it's better to hand-compile and reuse the regular expression yourself.

Notice, however, that the regular expression solution is only 20% as fast (worst case; 43% if using pre-compiled expressions) as the first sample which compares the string before and after stripping of whitespace.

Important note: Python interprets tabs as 8-space indentation, so you'd also need to .replace() literal tabs with the equivalent amount of space before evaluation.

Edited to add: the Python parser itself does not care about specific indentation levels, only that a given "block" is consistently indented. The amount of increase in indentation is effectively ignored and stripped away, replacing, instead, with INDENT and DEDENT tokens. (Indent with 16 spaces → only one INDENT token.) It's the change in indentation line-to-line that really matters.

amcgregor
  • 1,228
  • 12
  • 29
  • Added for follow-up: `def foo():\n\t pass` ← how much indentation does the "pass" have? Technically "one", represented as an equivalent to 10 spaces (8 space tab + 2). Python as a language doesn't care about the amount of indentation, only increases and decreases. [Token reference](https://docs.python.org/3.8/library/token.html#token.INDENT) and [interactive exploration](https://docs.python.org/3.8/library/tokenize.html#command-line-usage). To get the right answer requires using the parse module, counting INDENTs - DEDENTs up to the line in question. – amcgregor Apr 07 '20 at 13:05
0

What about

line = '    \t  asdf'
len(re.split('\w', line)[0].replace('\t', '    '))
>>> 10

Note that none of the other suggested solutions will count tabs right.

Michael
  • 7,316
  • 1
  • 37
  • 63
0

You can use Regex:

import re
with open("/path/to/file") as file:
    for mark, line in enumerate(file.readlines()):
        print mark, len(re.findall("^ *", line)[0])

The first number is the line number and the second is the indentation.

Or, if you want a specific line, do this:

import re
with open("/path/to/file") as file:
    print len(re.findall("^ *", file.readlines()[3])[0])

This will return the indentation for line 4 (remember the index will be the line number you want -1).

0

The "I have minimal knowledge of other techniques" method.

read = open('stringstuff.py','rb')
indent_space = []
for line in read:
    spaces = 0
    for char in line:
        if char != " ":
            break
        spaces += 1
    indent_space.append(spaces)


for i in xrange(len(indent_space)-1):
    new_indentation = abs(indent_space[i+1] - indent_space[i-1])
    if new_indentation != 0:
        indentation = new_indentation
        if new_indentation != indentation:
            print 'Indentation:', new_indentation, "found"
            indentation = new_indentation

for line in indent_space:
    print "Indentation of", line, "spaces or", line/indentation, "indents."
sihrc
  • 2,728
  • 2
  • 22
  • 43
0

From "Learning Python":

Python doesn’t care how you indent (you may use either spaces or tabs), or how much you indent (you may use any number of spaces or tabs). In fact, the indentation of one nested block can be totally different from that of another. The syntax rule is only that for a given single nested block, all of its statements must be indented the same distance to the right. If this is not the case, you will get a syntax error

That means, from what I understood, that two lines have the same indentation level if the sequence of white-space characters (either strings or tabs) at their left is the same.

That could make things confusing if you look at a text editor, because the tab character is rendered with different width depending on tab stops, so that things that look the same might not actually be the same. Even the very concept of indented the same DISTANCE to the right is questionable in this sense, because "distance", visually speaking, would depend on the convention used by each editor to render a given white-space character.

heltonbiker
  • 26,657
  • 28
  • 137
  • 252