Blog • Parentheses and indentation

chevron_right

Parentheses and indentation

Michał "phoe" Herda · Monday, 18 May, 2020 - 19:23 edit · 6 minutes

Claim: You know you've got used to reading Lisp when you no longer care about the parentheses and instead read Lisp by indentation. And this is how it is supposed to be read.

(Warning: this post has a slight rant tone to it.)

Let us consider three versions of read-file-into-string, a Common Lisp utility function adapted from the Alexandria source code. The questions are: How are they different? How do they work? What do they say about the code that is executed?

;;; Version A

(defun read-file-into-string (pathname &key (buffer-size 4096) external-format)
  (flet ((read-stream-content-into-string (stream)
           (check-type buffer-size (integer 1))
           (let ((*print-pretty* nil)
                 (buffer (make-array buffer-size :element-type 'character)))
             (with-output-to-string (datum)
               (loop :for bytes-read := (read-sequence buffer stream)
                     :do (write-sequence buffer datum :start 0 :end bytes-read)
                     :while (= bytes-read buffer-size))))))
    (with-open-file (stream pathname :direction :input
                                     :external-format external-format)
      (read-stream-content-into-string stream :buffer-size buffer-size))))

;;; Version B

(defun read-file-into-string ((pathname &key (buffer-size 4096) external-format)))
  (flet (read-stream-content-into-string (stream)
          (check-type buffer-size (integer 1)
          (let ((*print-pretty* nil))
                (buffer (make-array buffer-size :element-type 'character))
            (with-output-to-string (datum)))
              (loop :for bytes-read := (read-sequence buffer stream)
                    :do (write-sequence buffer datum :start 0 :end bytes-read))
                    :while (= bytes-read buffer-size)))
    (with-open-file ((stream pathname :direction :input
                                      :external-format external-format)))
      (read-stream-content-into-string stream :buffer-size buffer-size)))))))

;;; Version C

(defun read-file-into-string (pathname &key (buffer-size 4096) external-format)
  (flet ((read-stream-content-into-string (stream)
           (check-type buffer-size (integer 1))
    (let ((*print-pretty* nil)
      (buffer (make-array buffer-size :element-type 'character)))
      (with-output-to-string (datum)
        (loop :for bytes-read := (read-sequence buffer stream)
              :do (write-sequence buffer datum :start 0 :end bytes-read)
              :while (= bytes-read buffer-size))))))
        (with-open-file (stream pathname :direction :input
                                         :external-format external-format)
          (read-stream-content-into-string stream :buffer-size buffer-size))))

You are free to check these in a Common Lisp REPL in case of doubts.

The answer is that A and B tell the same story to the programmer, even though B won't compile. Many starting and ending parentheses in version B have been removed, duplicated, or displaced, which makes that code incomprehensible to the Lisp compiler.

C, however, does compile and work just like A does, and the Lisp compiler will not see any difference between forms from A and C. This is because C is a copy of A with broken indentation. The only thing that differs is the whitespace at the begining of each line.

To a Lisp programmer, version C is much more dangerous than B: while trying to evaluate the code from version B provides immediate feedback (it won't compile, it's broken code!), version C will instead silently work in a way that is not expected.

The intent conveyed by version A is that most of the space is taken by a local function, which is why most of the middle is indented more heavily than the bottom lines that form the actual body of read-file-into-string. Version C instead assumes that the only thing done by the local function is a check-type assertion - it is the only form indented in a way that denotes the body of a local function. The rest of function body implies that we first call some function named buffer on a freshly created array. Then, we open a with-output-to-string context, and perform everything else - which are the loop iteration and the subsequent with-open-file form - inside that context.

Such indentation is actively hostile towards the programmer, as I have intentionally created it to be misleading; it is therefore unlikely to find it in Lisp code that is actively used. Still, it is a proof of concept that it is possible to mislead a Lisp programmer, either by someone who either actively tries to do it or by one who is novice enough to not know better - and therefore, indirectly, a proof that indentation pays a crucial role in understanding and parsing Lisp code by humans.

To make another proof, we can take this example in another direction, a very extreme one this time. We will take the code from version A and remove all the parentheses from it (except where they are required to parse the context), leaving only indenation in place.

;;; Version D

defun read-file-into-string pathname &key (buffer-size 4096) external-format
  flet read-stream-content-into-string stream
         check-type buffer-size integer 1
         let *print-pretty* nil
             buffer make-array buffer-size :element-type 'character
           with-output-to-string datum
             loop :for bytes-read := read-sequence buffer stream
                  :do write-sequence buffer datum :start 0 :end bytes-read
                  :while = bytes-read buffer-size
    with-open-file stream pathname :direction :input
                                   :external-format external-format
      read-stream-content-into-string stream :buffer-size buffer-size

Suddenly, we get something strangely pythonesque. Code scopes are no longer defined by parentheses and instead they are defined purely by whitespace. Lisp programmers might also be put off by the sudden lack of parentheses.

And yet, this example - at least to me - reads very similar to the Lisp code from variant A. Again, this is because the indentation for both pieces of code is identical: it is clear where a given block or subblock goes deeper, where it continues at the same depth, and where it ends, and this is the information that a Lisp programmer uses when parsing code meant for human consumption.

There's a valid point that needs to be taken into account here - that one needs to be somewhat proficient Lisp semantics in order to know the argument counts for each form that is executed. In the above example, one needs to know that make-array takes one mandatory argument and may then take a number of keyword arguments, that write-sequence takes two mandatory arguments and keyword args, that check-type takes a variable and a type, and so on. Such knowledge comes only from using the language in depth, but, thankfully, it is knowledge that relates to the right side of each line of such program, rather than to the left part. And the left is where the indentation is.

When writing Lisp, two tasks are not meant to be done by humans: managing parentheses and indenting code. The programmer's focus should be on what happens between the parentheses and whose meaning and order is provided by indentation.

When I write Lisp, I do not pay any attention about the indentation; emacs automatically indents my code for me as I write it thanks to electric-indent and aggressive-indent.
When I write Lisp, I do not need to pay any attention to closing or counting parentheses: emacs automatically inserts them in pairs and prevents me from deleting a lone paren thanks to smartparens, and I have a visual color cue that informs me about the depth of a given paren thanks to rainbow-delimiters.
When I write Lisp, I do not need to pay much attention to where exactly I insert a new subform: if I misplace a form within my Lisp expression, emacs will automatically indent it for me, and I will notice that it is not indented at the level where I expected it to be. I can then fix its position, again, thanks to smartparens.

This is also why I consider writing Lisp code to be almost impossible without support from an editor that performs these tasks for the programmer. Even more, I consider sharing Lisp code to be impossible if that code is not formatted correctly, because then this code will not be indented correctly and will therefore either require other Lisp programmers who read it to put extra cognitive load to try and understand it, or, worse - it will give them outright wrong and misleading information about what a given piece of Lisp code does.

Oh, and while we're at it, Lisp also solves the tabs-versus-spaces debate. It is impossible to indent Lisp with tabs only, unless either tabs are exactly one-space wide or one chooses the worst possible option of both at the same time in the same lines.

So, spaces it is.