As promised, I am adding a few more thoughts on my recent (re-)exploration of Compiler design and Python programming. I had been flirting before [link] with the idea of writing a compiler in Python, using tools developed in Python (for Python programmers) such as the ‘RPython’ branch of ‘Pypy‘ or starting form libraries such as ‘rply‘ . None of those tools though was taking the whole project (compiler) to its completion (or anywhere close enough for me). On the contrary, following the Compiler Construction class (on ITunesU) took me all the way to a functioning copy of an Oberon compiler.
Porting the code to Python was clearly an afterthought (mine) and the resulting code is far from being an exemplary piece of Pythonic work(!), but in the process I learned a lot AND got a working compiler for a (subset of) a real language: Oberon.
Ok, most of you might have never heard of Oberon or Modula II (or Pascal, really?) but for me there was extra motivation there, going down the memory lane!
Here is a short list of observations I made in the translation (and refactoring) process:
- First of all, I confirmed my impression that Python is really easy and expressive. The conversion from the original examples (written in Oberon itself) was so straightforward. Additionally, in many places where lists of ‘things’ were being assembled and massaged, most of the minutia/code just disappeared… (that’s a good thing!)
- The code did not appear to shrink much. This was a surprise at first, but then I realised that Wirth’s coding style was playing a trick on my eyes. The semicolon vs. line count ratio in his code is very high (up to 4 or 5 statements per line), while my translated code used mostly a one to one ratio which is much easier on eyes, and readable. The result is approximately the same length (number of lines) but the difference in readability in some section is huge.
- I took the opportunity to change the code to an ever so slightly more functional style. The original code makes a heavy use of parameters passed as references (VAR parameters) and a small number of globals. Both are bad ideas that eventually can come back and haunt you. In fact while refactoring the code, I found myself a few times wondering what was really going on. Was the parameter mutated in this function or was it using a local copy? In Python a function can return as many values as needed, object encapsulation then can take care of managing those cases where before a global value (and a function side effect) were the only option.
- I truly missed (static) type checking. Yes, dynamic typing is a great convenience, code seems to work immediately, there is much less clutter around the ‘algorithm’. But, while doing extensive ‘refactoring’, it is way too easy to get the instant gratification of seeing a piece of code (the one you are focusing on at the moment) working correctly, without noticing that you just broke another enormous chunk of code. It will reveal itself later, at run time, for sure but too late!
- The way out is of course is to stick to some serious testing practices. I learned to use and appreciate py.test, one of the many Python libraries that facilitate (enormously) the effort to put together and maintain a set of tests for your code. There is nothing more gratifying and reassuring than seeing that all your tests are still passing after some major code surgery.
Eventually, I got a real appreciation for Guido’s latest effort to introduce (optional) typing in Python (latest dot release, I believe Python 3.5). Testing is a great idea, but allowing the compiler to do more (type) checking for us is an even better one.
It is interesting how Matz (Ruby programming language father) seems to have reached the same conclusion, judging from one of his most recent talks (video).
Now the project could take two turns:
- Playing with the language syntax to bring it into the new millennium, then add some objects…
- Developing a Virtual Machine to test the compiler on virtual and (why not) real hardware…
I decided to take both! To be continued….