107

I have a python editor where the user is entering a script or code, which is then put into a main method behind the scenes, while also having every line indented. The problem is that if a user has a multi line string, the indentation made to the whole script affects the string, by inserting a tab in every space. A problem script would be something so simple as:

"""foo
bar
foo2"""

So when in the main method it would look like:

def main():
    """foo
    bar
    foo2"""

and the string would now have an extra tab at the beginning of every line.

codeforester
  • 39,467
  • 16
  • 112
  • 140

10 Answers10

159

textwrap.dedent from the standard library is there to automatically undo the wacky indentation.

thraxil
  • 4,971
  • 2
  • 19
  • 10
  • 18
    The standard library never ceases to hold surprises. – thraxil Sep 11 '09 at 19:33
  • 34
    Note that if the first line starts as `"""foo`, then the first line lacks the leading indentation that the other lines have, so `dedent` won't do anything. It will work if you wait to start foo on the next line and escape the first newline like this: `"""\\` – Scott H May 05 '16 at 16:00
  • 4
    To address the short comings that @ScottH mentions, please see my answer regarding `inspect.cleandoc` – bbenne10 Dec 01 '17 at 19:25
72

From what I see, a better answer here might be inspect.cleandoc, which does much of what textwrap.dedent does but also fixes the problems that textwrap.dedent has with the leading line.

The below example shows the differences:

>>> import textwrap
>>> import inspect
>>> x = """foo bar
    baz
    foobar
    foobaz
    """
>>> inspect.cleandoc(x)
'foo bar\nbaz\nfoobar\nfoobaz'
>>> textwrap.dedent(x)
'foo bar\n    baz\n    foobar\n    foobaz\n'
>>> y = """
...     foo
...     bar
... """
>>> inspect.cleandoc(y)
'foo\nbar'
>>> textwrap.dedent(y)
'\nfoo\nbar\n'
>>> z = """\tfoo
bar\tbaz
"""
>>> inspect.cleandoc(z)
'foo\nbar     baz'
>>> textwrap.dedent(z)
'\tfoo\nbar\tbaz\n'

Note that inspect.cleandoc also expands internal tabs to spaces. This may be inappropriate for one's use case, but works fine for me.

bbenne10
  • 1,447
  • 14
  • 23
  • 3
    Beware that these two aren't exactly equivalent otherwise, and cleandoc does more processing than just removing indents. At the very least, expanding `'\t'` to `' '` – Brian Oct 11 '19 at 06:29
  • 1
    This is true, but I didn't notice at the time. I'll update the answer to reflect at least the tab expansion. – bbenne10 Oct 11 '19 at 13:44
  • 3
    Could also `textwrap.dedent(s).strip()` to avoid changing tabs and still handle leading and trailing newlines. – DocOc May 04 '21 at 22:00
  • The context in which I wrote this answer is a much more general one than one under which the question was asked. I was looking to re-flow docstrings for documentation purposes (so the collapsing is helpful). You're right that you could post-process the `textwrap.dedent` output for more specific scenarios. I neglected the nuance of the original question when I answered this. I do believe that my answer is more generically helpful, however. – bbenne10 May 13 '21 at 13:11
  • IDK if its a dummy mistake to make for python world but One should be careful using `\n` somwhere in the triple quted string. `inspect.cleandoc` will not clean that one. (experienced.). – eddym Sep 08 '21 at 14:58
21

What follows the first line of a multiline string is part of the string, and not treated as indentation by the parser. You may freely write:

def main():
    """foo
bar
foo2"""
    pass

and it will do the right thing.

On the other hand, that's not readable, and Python knows it. So if a docstring contains whitespace in it's second line, that amount of whitespace is stripped off when you use help() to view the docstring. Thus, help(main) and the below help(main2) produce the same help info.

def main2():
    """foo
    bar
    foo2"""
    pass
MarianD
  • 13,096
  • 12
  • 42
  • 54
SingleNegationElimination
  • 151,563
  • 33
  • 264
  • 304
  • Thanks for the reply. Unfortunately the indentation is completely automated, as my code reads in the script as a string (in Java) and indents every line in that string. –  Sep 11 '09 at 18:16
  • I don't think only doc string uses triple quote. This automation won't apply elsewhere – tribbloid Jun 30 '19 at 20:31
  • @tribbloid the special logic for docstrings is specific to the use case of making `help()` do something nice by default. To use the same dedenting *logic* in other places, you can use `textwrap.dedent()` as described in basically every other answer to this question. – SingleNegationElimination Aug 22 '19 at 00:39
2

Showing the difference between textwrap.dedent and inspect.cleandoc with a little more clarity:

Behavior with the leading part not indented

import textwrap
import inspect

string1="""String
with
no indentation
       """
string2="""String
        with
        indentation
       """
print('string1 plain=' + repr(string1))
print('string1 inspect.cleandoc=' + repr(inspect.cleandoc(string1)))
print('string1 texwrap.dedent=' + repr(textwrap.dedent(string1)))
print('string2 plain=' + repr(string2))
print('string2 inspect.cleandoc=' + repr(inspect.cleandoc(string2)))
print('string2 texwrap.dedent=' + repr(textwrap.dedent(string2)))

Output

string1 plain='String\nwith\nno indentation\n       '
string1 inspect.cleandoc='String\nwith\nno indentation\n       '
string1 texwrap.dedent='String\nwith\nno indentation\n'
string2 plain='String\n        with\n        indentation\n       '
string2 inspect.cleandoc='String\nwith\nindentation'
string2 texwrap.dedent='String\n        with\n        indentation\n'

Behavior with the leading part indented

string1="""
String
with
no indentation
       """
string2="""
        String
        with
        indentation
       """

print('string1 plain=' + repr(string1))
print('string1 inspect.cleandoc=' + repr(inspect.cleandoc(string1)))
print('string1 texwrap.dedent=' + repr(textwrap.dedent(string1)))
print('string2 plain=' + repr(string2))
print('string2 inspect.cleandoc=' + repr(inspect.cleandoc(string2)))
print('string2 texwrap.dedent=' + repr(textwrap.dedent(string2)))

Output

string1 plain='\nString\nwith\nno indentation\n       '
string1 inspect.cleandoc='String\nwith\nno indentation\n       '
string1 texwrap.dedent='\nString\nwith\nno indentation\n'
string2 plain='\n        String\n        with\n        indentation\n       '
string2 inspect.cleandoc='String\nwith\nindentation'
string2 texwrap.dedent='\nString\nwith\nindentation\n'
codeforester
  • 39,467
  • 16
  • 112
  • 140
2

I wanted to preserve exactly what is between the triple-quote lines, removing common leading indent only. I found that texwrap.dedent and inspect.cleandoc didn't do it quite right, so I wrote this one. It uses os.path.commonprefix.

import re
from os.path import commonprefix

def ql(s, eol=True):
    lines = s.splitlines()
    l0 = None
    if lines:
        l0 = lines.pop(0) or None
    common = commonprefix(lines)
    indent = re.match(r'\s*', common)[0]
    n = len(indent)
    lines2 = [l[n:] for l in lines]
    if not eol and lines2 and not lines2[-1]:
        lines2.pop()
    if l0 is not None:
        lines2.insert(0, l0)
    s2 = "\n".join(lines2)
    return s2

This can quote any string with any indent. I wanted it to include the trailing newline by default, but with an option to remove it so that it can quote any string neatly.

Example:

print(ql("""
     Hello
    |\---/|
    | o_o |
     \_^_/
    """))

print(ql("""
         World
        |\---/|
        | o_o |
         \_^_/
    """))

The second string has 4 spaces of common indentation because the final """ is indented less than the quoted text:

 Hello
|\---/|
| o_o |
 \_^_/

     World
    |\---/|
    | o_o |
     \_^_/

I thought this was going to be simpler, otherwise I wouldn't have bothered with it!

Sam Watkins
  • 7,819
  • 3
  • 38
  • 38
1

The only way i see - is to strip first n tabs for each line starting with second, where n is known identation of main method.

If that identation is not known beforehand - you can add trailing newline before inserting it and strip number of tabs from the last line...

The third solution is to parse data and find beginning of multiline quote and do not add your identation to every line after until it will be closed.

Think there is a better solution..

Mikhail Churbanov
  • 4,436
  • 1
  • 28
  • 36
  • Thanks for the reply. So you are suggesting I strip each line of the indentation that has been inserted? I'm confused... –  Sep 11 '09 at 18:15
1

This does the trick, if I understand the question correctly. lstrip() removes leading whitespace, so it will remove tabs as well as spaces.

from os import linesep

def dedent(message):
    return linesep.join(line.lstrip() for line in message.splitlines())

Example:

name='host'
config_file='/Users/nmellor/code/cold_fusion/end-to-end/config/stage.toml'
message = f"""Missing env var or configuration entry for 'host'. 
              Please add '{name}' entry to file
              {config_file}
              or export environment variable 'mqtt_{name}' before
              running the program.
           """

>>> print(message)
Missing env var or configuration entry for 'host'. 
              Please add 'host' entry to
              '/Users/nmellor/code/cold_fusion/end-to-end/config/stage.toml'
              or export environment variable 'mqtt_host' before
              running the program.

>>> print(dedent(message))
Missing env var or configuration entry for 'host'. 
Please add 'host' entry to file
'/Users/nmellor/code/cold_fusion/end-to-end/config/stage.toml'
or export environment variable 'mqtt_host' before
running the program.

The above solution will remove ALL indentation. If you want to remove indentation that is common to the whole multiline string, use textwrap.dedent(). But take care that the first and last lines in the multi-line string are also indented otherwise .dedent() will do nothing.

Nic
  • 1,518
  • 12
  • 26
0

I had a similar issue: I wanted my triple quoted string to be indented, but I didn't want the string to have all those spaces at the beginning of each line. I used re to correct my issue:

        print(re.sub('\n *','\n', f"""Content-Type: multipart/mixed; boundary="===============9004758485092194316=="
`           MIME-Version: 1.0
            Subject: Get the reader's attention here!
            To: recipient@email.com

            --===============9004758485092194316==
            Content-Type: text/html; charset="us-ascii"
            MIME-Version: 1.0
            Content-Transfer-Encoding: 7bit

            Very important message goes here - you can even use <b>HTML</b>.
            --===============9004758485092194316==--
        """))

Above, I was able to keep my code indented, but the string was left trimmed essentially. All spaces at the beginning of each line were deleted. This was important since any spaces in front of the SMTP or MIME specific lines would break the email message.

The tradeoff I made was that I left the Content-Type on the first line because the regex I was using didn't remove the initial \n (which broke email). If it bothered me enough, I guess I could have added an lstrip like this:

print(re.sub('\n *','\n', f"""
    Content-Type: ...
""").lstrip()

After reading this 10 year old page, I decided to stick with re.sub since I didn't truly understand all the nuances of textwrap and inspect.

Mark
  • 4,249
  • 1
  • 18
  • 27
0

There is a much simpler way:

    foo = """first line\
             \nsecond line"""
Kostia
  • 11
  • This requires you to manually add the newline, and will add the indentation spaces to the previous line. – bohrax Mar 31 '22 at 19:53
  • Not sure what the problem is to add "\n". If you format from scratch it's easy to add, not seeing any problems adding extra symbols to user input or fetched text as well. And it doesn't add anything to a line ending with "\". Maybe it doesn't fit all use cases but for me it worked much better than anything I was able to find. – Kostia Mar 31 '22 at 20:10
  • It does add the indentation spaces (after), and it doesn't solve the original problem, as the data came from a user. – bohrax Mar 31 '22 at 20:23
-15

So if I get it correctly, you take whatever the user inputs, indent it properly and add it to the rest of your program (and then run that whole program).

So after you put the user input into your program, you could run a regex, that basically takes that forced indentation back. Something like: Within three quotes, replace all "new line markers" followed by four spaces (or a tab) with only a "new line marker".

Dag Høidahl
  • 7,873
  • 8
  • 53
  • 66
FlorianH
  • 3,084
  • 1
  • 19
  • 15
  • yep, precisely. That's the only possible solution I've come up with. Not sure why I didn't go ahead with it...I think I might have to do this if nothing better comes up. –  Sep 11 '09 at 18:46
  • 25
    @thraxil's suggestion to use textwrap.dedent is the way to go. Consider changing your accepted answer. – Chris Calo Mar 03 '12 at 07:24
  • 3
    @ChrisCalo @ bbenne10's answer is even better – user2297550 Oct 29 '18 at 07:56