0

Motivation. I recently published an introductory programming textbook that follows a language-agnostic approach. Although the notation I use is a tiny subset of Java, the book's conceit is that it is essentially a universal subset of all imperative programming languages, and an Appendix shows the minor changes needed to map the examples into (say) Python, JavaScript, or C/C++.

The book makes extensive use of indentation to show the refinement hierarchy in code, where comments are used for specifications. For example, Chapter 1 contains this example:

     /* Output the Integer Square Root of an integer input. */
        /* Obtain an integer n≥0 from the user. */
           int n = in.nextInt();
        /* Given n≥0, output the Integer Square Root of n. */
           /* Let r be the integer part of the square root of n≥0. */
              int r = 0;
              while ( (r+1)*(r+1) <= n ) r++;
           System.out.println( r );

Connundrum. What I overlooked until just now is that there appears to be a fundamental incompatibility between my use of indentation to show the refinement hierarchy (as illustrated above) and Python's use of indentation as syntax. In particular, my (working) Python version of the sample code is:

    # Output the Integer Square Root of an integer input.
      # Obtain an integer n≥0 from the user.
    n = int(input())
      # Given n≥0, output the Integer Square Root of n.
        # Let r be the integer part of the square root of n≥0.
    r = 0
    while ( (r+1)*(r+1) <= n ):
        r = r + 1
    print( r )

The problem is that there would seem to be no way in Python to indent code (say, to be consistent with the refinement hierarchy) and have that indentation NOT be interpreted as syntax! The use of indentation in my book to show refinement is pervasive, so unless I can find a solution, I would appear to be hosed.

Seeking a solution. What I would seem to need is a Unicode space character that can be used to "physically" create whitespace at the beginning of a line but that is otherwise ignored by the Python compiler (say) for the purpose of determining "logical" indentation, or as a character in a subsequent identifier. So far, I haven't found such a character.

Does anyone have a suggestion?

I tried using an empty character (as per emptycharacter.com) before "n=int(input())". It was ignored as a "logical" indentation, was displayed as a red dot in the editor, but was then flagged by the compiler as an "invalid character in identifier".

  • Python doesn't even allow mixing tab and spaces, so I doubt it would take anything else happily... – MrE Jun 20 '23 at 20:07
  • Totally fugly, but you might be able to leverage something like: `'''...'''n = int(input())` – JonSG Jun 20 '23 at 20:11
  • 1
    I don't think that what you are asking is really viable in Python and wonder if an x y solution might work for you. What if you leave the comment denoting character in the first column to satisfy the python interpreter and indent your commentary as you described above(possibly with an additional character to draw attention to the text start) – Alan Hoover Jun 20 '23 at 20:29
  • @AlanHoover and maybe fill all the way from the left margin to the comment. Like `### Given n>=0...` and on the next line `##### Let r be...` – slothrop Jun 20 '23 at 20:39
  • @slothrop that works also :) – Alan Hoover Jun 20 '23 at 20:42
  • Why do you want it to be valid Python code? Is it for some automated tester you're running? – Kelly Bundy Jun 20 '23 at 20:48
  • I am confused by the posts of @Alan Hoover and slothrop, as my problem is the formatting of the Python code, not the formatting of the Python comments. The comments in my (working version of) Python code are fine; it is the Python code that needs to move right to fit in with the specification hierarchy. The post by $JonSG seems to closer to working. I had to add a semicolon after the quoted string, but with that change, this worked: # Obtain an integer n≥0 from the user. ' ';n = int(input()) but the same mechanism mysteriously fails for the other statements. – Tim Teitelbaum Jun 20 '23 at 21:22
  • 1
    @TimTeitelbaum I should clarify: I'm taking it as given that the formatting of the Python **code** cannot be changed. (@JoshKelley's answer summarises the reasons for that). So my suggestion is that controlling the visual indentation of the **comments** might be the best achievable way of conveying what you want to, while keeping the Python code valid. – slothrop Jun 20 '23 at 21:27
  • @slothrop Thanks for clarifying your intent. Putting on a "language-designer's hat", and returning to the title of my post, do you think it would work if there were such a character? Specifically, just for the purpose of being definite, imagine that underscore (_) was that character. Then it would seem that a line-prefix of underscores could be safely ignored (for the purpose of determining syntax) but would allow the indentation that I need/want. Having chugged the "refinement-hierarchy cool aide", and having written a 400+ page book that promotes it, I am averse to giving it up. – Tim Teitelbaum Jun 20 '23 at 21:54
  • 1
    If underscores hypothetically behaved that way, your idea would work in the sense of producing syntactically valid code. However, I fear it would be confusing for the reader, who sees a line indented at a particular position but can't tell at first glance whether that is for syntactic purposes or for the structure of your explanation. Certainly I'd find it hard to process as someone who's very accustomed to reading Python code. Arguably that's less of a problem for beginners, but conversely it may hinder their understanding of how indentation works in Python. – slothrop Jun 20 '23 at 22:01
  • 2
    Even if you could do this, it would be far too confusing for the purposes of an introductory textbook. (Heck, using indentation like this is *already* against how normal people indent their Java, Javascript, C, or C++, so you're teaching your readers bad practice by doing this, but introducing weird Unicode spaces on top of that makes the problem even worse.) The idea of a "universal subset of all imperative programming languages" is doomed from the start. – user2357112 Jun 21 '23 at 03:06

2 Answers2

2

The Python grammar omits details such as what, specifically, constitutes a valid indent. My reading of the Python tokenizer is that the list of permitted indentation characters and list of whitespace characters is short and the same. So it seems like you're out of luck in the standard language.

There are a couple of suggestions in this answer about adding scoping blocks to Python, but that's visually intrusive and potentially a lot of work to update each of your examples.

Python is a dynamic language, with an open-source implementation that you can modify, and it offers the use of import hooks to modify code as it's imported. See here and here for ideas. E.g., maybe your own stream reader could strip your custom Unicode space?

Of course, depending on how much control you have over your book (and how much effort you're willing to invest in exerting that control), there's nothing saying that the formatting in the book has to match the indentation of the code.

(Since your book is intended as an introduction to programming, and since significant whitespace is such a basic part of Python development, I'm not sure if using nonstandard Python whitespace would be doing readers a favor.)

Josh Kelley
  • 56,064
  • 19
  • 146
  • 246
0

I'm sorry to say that I have concluded that there is a fundamental incompatibility between the use of (physical) indentation to show a refinement hierarchy, and Python's use of (logical) indentation as syntax. Here is my analysis.

First, consider just the Phython code from my example:

n = int(input())
r = 0
while ( (r+1)*(r+1) <= n ):
    r = r + 1
print( r )

All lines except for the body of the while loop are syntactically at the "top level", and accordingly only that line should be (logically) indented.

But consider an indentation style that reflects a refinement hierarchy. This convention is designed to allow a program to be understood at arbitrary levels of detail, with comments playing the role of executable specifications whose implementations, i.e., refinements, are indented beneath them. Such indentation is a surrogate for the sort of elision that could be provided in (say) a folding editor, e.g.,

# Output the Integer Square Root of an integer input.

or

# Output the Integer Square Root of an integer input.
  # Obtain an integer n≥0 from the user.
  # Given n≥0, output the Integer Square Root of n.

or

# Output the Integer Square Root of an integer input.
  # Obtain an integer n≥0 from the user.
  # Given n≥0, output the Integer Square Root of n.
    # Let r be the integer part of the square root of n≥0.
    print( r )

or

# Output the Integer Square Root of an integer input.
  # Obtain an integer n≥0 from the user.
    n = int(input())
  # Given n≥0, output the Integer Square Root of n.
    # Let r be the integer part of the square root of n≥0.
    print( r )

or

# Output the Integer Square Root of an integer input.
  # Obtain an integer n≥0 from the user.
    n = int(input())
  # Given n≥0, output the Integer Square Root of n.
    # Let r be the integer part of the square root of n≥0.
      r = 0
      while ( (r+1)*(r+1) <= n ):
        r = r + 1
    print( r )

Notice that the top-level lines of Phython code are (phyically) indented various different amounts. I don't see any way that one can get that effect (say) by using the sort of special character that I previously proposed (signified by _).

Unless I'm missing something, I am sadly forced to conclude that Python's use of indenting as syntax precludes showing the indented refinement hierarchy in the manner I want.

Thanks to @slothrop and all others for your help.