0

afaik reindent.py (available in the standard python examples) has a tokenizer allowing it to do smart reindenting based on the indentation level rather than on the number of spaces osed per level (which can vary in bad code)

unfortunately it enforces 4-space indentation, but i want tabs, because 1 tab == 1 indentation level is more logical than x spaces.

this question has no suitable answer:

  • i don’t care about pep-8 (i know how to write my code)
  • vim is installed, but :retab! doesn’t handle inconsistent indentation
  • all tools convert spaces used for alignment (!= indentation) to tabs, too.

one way would be to use reindent.py and afterwards doing sth. like:

#!/usr/bin/env python3
from re import compile
from sys import argv

spaces = compile("^ +")
multistr = False
for line in open(argv[1]):
    num = 0
    if not multistr:
        try:
            num = len(spaces.search(line).group(0)) // 4
        except AttributeError:
            pass
    print("\t"*num + line[num*4:-1])
    if line.count('"""') % 2 == 1:
        multistr = not multistr

but that’s rather hacky. is there no non-zealot version of reindent.py?

PS: why suggests the highlighting that // 4 is a comment instead of a truncating division?


The following script should do the trick, but either i missed sth., or tokenize is buggy (or the example in the python documentation)

#!/usr/bin/env python3

from tokenize import *
from sys import argv

f = open(argv[1])
def readline():
    return bytes(f.readline(), "utf-8")

tokens = []
ilvl=0
for token in tokenize(readline):
    if token.type == INDENT:
        ilvl+=1
        tokens.append((INDENT, "\t"*ilvl))
    else:
        if token.type == DEDENT:
            ilvl-=1
        tokens.append(token)

print(untokenize(tokens).decode('utf-8'))
Community
  • 1
  • 1
flying sheep
  • 8,475
  • 5
  • 56
  • 73
  • about PS: you know `//` is a comment marker in a lot of languages... don't be too zealot... :) – neurino May 17 '11 at 14:12
  • i know i tend to be zealoty myself sometimes, but this code is highlighted as python, a language in which `//` means “truncating division”. highlighting everything after this operator as comment (css-class `com`; gray) is plain wrong. – flying sheep May 17 '11 at 14:20
  • You start from the idea that syntax higligher checks question tags... which is not true. :) – neurino May 17 '11 at 14:21
  • afaik it does. it checks shebangs, too, i think. and special hint lines like `` – flying sheep May 17 '11 at 14:23
  • i just checked. it definitely does. see [this question by me](http://stackoverflow.com/questions/5572247/how-to-find-xml-elements-via-xpath-in-python-in-a-namespace-agnostic-way) and try to recreate it. it will use one highlighting scheme on all three code boxes, which i averted with above hint lines. – flying sheep May 17 '11 at 14:30
  • my fault then, you should post an issue. Anyway I understand about non-indenting spaces at beginning of the line, you are putting you in a tight corner... – neurino May 17 '11 at 14:38
  • try http://docs.python.org/library/tokenize.html - no need to tokenize yourself. – Jochen Ritzel May 17 '11 at 15:21
  • thanks, but it seems buggy. i can’t insert real TokenInfos myself (assertion error), and when using tuples like in the example, it generates the part up to before the first indent, and then again everything from top to bottom with spacing errors. i put the script into the question. – flying sheep May 17 '11 at 18:05

1 Answers1

3

Using sed in unix you could get it with one line:

sed -r ':f; s|^(\t*)\s{4}|\1\t|g; t f' file

edit: this will work for spaces at beginning of the line only.

neurino
  • 11,500
  • 2
  • 40
  • 63
  • nope, please read again. i enjoy the power of `sed -r` myself, but in this case, regexes are afaik not enough. – flying sheep May 17 '11 at 14:25
  • 1
    @Flying: is it you don't want to use `sed` or using it won't accomplish your request? – neurino May 17 '11 at 14:30
  • your edit seems (i didn’t try) to do almost what i want, but it will mistakenly replace indentation in docstrings, too. that’s the problem, you’ll need to parse the file to do it correctly (other than my script), and only reindent.py does that right. but it uses 4 spaces instead of tabs. guess i’ll have to rewrite it. *sigh* – flying sheep May 17 '11 at 14:54