Friday, August 8, 2014

Whitespace in Programming Languages

    Disclaimer: This post is just my way of getting back into the habit of writing blog posts. You might find it somewhat interesting but it is off-topic as far as the blog's statement of purpose goes.

    I have had some exposure to different kinds of programming languages, and it occurred to me recently that one (crude) way to see what is part of a language's core design philosophy is to see how it uses white-space. White-space is one of the easiest parts of a language's syntactical terms to both write and read. It's reasonable to assume that whatever idea a language allows you to express with minimal syntactic overhead is close to the heart of the language's philosophy.

    Now, no discussion of white-space in programming languages can fail to mention Python. Python is a modern, multi-paradigm programming (scripting) language that is somewhat controversial for it's insistence on using indentation as its code structure. In most languages, it's considered "good form" to indent wherever you have control structures (functions, conditionals, loops). Python requires it. Let's take a look.

#! python

# break operator
# prime numbers

for n in range(2, 1000):
for x in range(2, n):
if n % x == 0:
print n, 'equals', x, '*', n/x
break
else:
# loop fell through without finding a factor
print n, 'is a prime number'

    What is the effect? The same as what "good coding style" tries to promote, of course: readability and modularity. If your code has too many levels of nested control structures, you really feel it. Python forces you to maintain the same indentation throughout - if you use 4 spaces once, you use it everywhere or the interpreter complains. There's probably more to say here, but like I said, this is a warm up and I have other languages to cover.

   Another important language to discuss is the pathological case - the language "Whitespace" itself. This is a joke language developed by Edwin Brady (who also wrote Idris, which me may see more of in future blog posts...) in which the language commands are all different types of white-space - space, tab, newline, etc. The reason this is a joke language is very much related to the thesis of this post - white-space should be about removing noise from the program or organizing it to be more readable. But if everything is white-space, it's totally unreadable! You're best bet in programming "Whitespace" is to use a hex-editor.

    Now for an even more obscure programming language - J, a descendent of APL. J uses white-space in the creation of arrays, which makes sense - J is one of the few "array oriented" programming languages. Consider the following declaration of a 2 x 3 matrix:

mat = 2 3 $ 0 1 2 3 4 5

and consider the equivalent C code:


int mat[2][3] = { {0, 1, 2} , {3, 4, 5}};
    J uses spaces to delimit the elements of an array, presumably because arrays are the core data structure in the language and it would be excruciating to type them out in the C style.

    Haskell does something similar, as well. If you noticed, J functions (the dollar sign) also use white-space to receive their arguments. Haskell does this for functions of any arity (J only supports up to 2 arguments - if you want more, you guessed it, use an array). Consider a function that takes 3 integers and tests whether they could be the lengths of a right triangle (assuming the 3rd is the hypotenuse).


is_pythag :: Int -> Int -> Int -> Bool
is_pythag a b c = a*a + b*b == c*c

temp = is_pythag 3 4 5

    Like with J and arrays, in Haskell functions are a core data structure so the syntax for defining and using them is trimmed down (I could have omitted the type signature for the function I defined). Though a pure functional language, Haskell also tries to support the idiomatic line-by-line imperative programming. Programmers of imperative languages take using a line to separate commands for granted (whether its required or merely good style), but in Haskell a special form called "do notation" is used to achieve the same.

nameDo :: IO ()
nameDo = do putStr "What is your first name? "
first <- getLine
putStr "And your last name? "
last <- getLine
let full = first ++ " " ++ last
putStrLn ("Pleased to meet you, " ++ full ++ "!")

    I'll wrap this up with a discussion of Scala, another multi-paradigm programming language. Scala runs on the Java Virtual Machine and shares some syntax in common with Java itself. One complaint many have about Java that Scala tries to fix, is Java's excruciating verbosity. Consider adding a number to a "BigInt":

BitInteger myBig = big1.add(big2);

    You can see how this would get out of hand with larger arithmetical expressions, especially ones using operators with different precedents from each other (whereas method invocation always has the same precedence). In Scala, you can write:

val myBig = big1 add big2

or even

val myBig = big1 + big2

    Part of this brevity comes from allowing symbolic method names, but allowing methods to sometimes be used as if they were infix operators really helps. Scala also has good type inference, saving you the need to explicitly declare the types (in exchange for having to type val, var, and def instead). Additionally, Scala has a few "magic methods" which can be omitted, such as "apply". Thus, the following two lines of code are equivalent.

myApplyObj.apply(mySubj)
myApplyObj mySubj
    Hope this wasn't too disappointing - you were warned. My future posts will be more ambitious, as I am currently working on a MIPS simulator in Agda and was able to prove a few neat things about representing numbers in binary!