Working on the Skulpt Python to Javascript Compiler

I am making heavy use of the Skulpt cross-compiler that allows me to run Python programs completely in the browser. It compiles the Python to Javascript and then runs the Javascript, allowing an auto-grader to be built that requires zero server footprint that I use in my free online MOOC called Python for Informatics. The same compiler is used by CodeSkulptor which is part of the Rice University Coursera MOOC titled An Introduction to Interactive Programming in Python.

Since Skulpt is a complete ground up implementation of Python including the need to implement all the standard libraries it is naturally incomplete. And so as my students go through the various assignments, we encounter little bits and pieces that are not quite right or not implemented.

Earlier this week, I was thinking that I would have to just work around the little things that were wrong or missing, but then the Computer Scientist inside me wondered how hard it would be to dive into the source code of the Skulpt compiler and fix a few things that were bothering me.

I started working on the code Thursday morning and it was relatively straightforward in its approach. The nice thing is that it approaches to writing a compiler have not changed too much since I last wrote a compiler in 1979. They create a parser to turn the language into tokens, a grammar that expresses how the tokens are combined, and then code that triggers on each of the rules of the grammar that produces an intermediate representation of the program, a code generator that turns the intermediate representation into runnable code, and a run-time library to implement the built-in functions needed. After all this, Skulpt uses the Google closure compiler to pull all the pieces together to produce a nice tight include file with all of it ready to run in the browser:

-rw-r–r– 1 csev staff 171469 Jan 20 09:05 builtin.js
-rw-r–r– 1 csev staff 214624 Jan 20 09:05 skulpt.js

I write down some of my steps for my own record and so others might possibly benefit if they too want to dive into working on Skulpt.

The first step is to clone the Mercurial repository on Google Code. Here is my clone:

http://code.google.com/r/drchuck-skulpt-mods/

Then I checked out my clone to my laptop:

hg clone https://drchuck@code.google.com/r/drchuck-skulpt-mods/

Most of the operations are run from a shell script called ‘m’ – the first thing you might want to do is run the unit tests to get a baseline

./m

Yup – it is that simple. There are over 300 unit tests that get run through skulpt, Python, and Google V8 and have their output compared.

Working with Skulpt

As best I can tell this is a pretty slow-moving project – but it does move so I felt it was important to document all my work in the Skulpt issue list. Before I worked on something, I wrote an issue in the main skulpt repo like Issue 116. Then I would add a comment when my modification was complete in my clone. I hope this helps the people running the Skulpt project the best chance of getting my code back into their repo.

Extending the runtime

If you are going to add a new feature, first you need a bit of Python code to exercise the feature. For the round, I wrote this:

f = 2.515
g = round(f,1)
print g

To test this, you run:

./m run rnd.py

Your output will look like this

-----
f = 2.515
g = round(f,1)
print g

-----
Uncaught: "#", [unnamed] line 61 column 29
/*    61 */                 throw err;
                            ^
dbg> 

It just means that the round function does not work. Modify the files

src/builtindict.js
src/builtin.js

And add the implementation. Here are the changes needed. Ignore the dist and doc diffs and focus on the src diffs. The dist and doc diffs are generated in a bit – they end up in the repo so folks can just grab the disk and doc from the repo without needing to check out the code.

When you make changes, run

./m

Until unit tests pass and then run

./m dist

Until it successfully completes:

...
. Wrote dist/builtin.js
. Updated doc dir
. Wrote dist/skulpt.js.
. gzip of compressed: 50585 bytes

Then re-run your code:

./m runopt rnd.py

Until your code works. You may go a few rounds edit, unit test, dist, re-run, but the process takes about 20 seconds so it is not as painful as it sounds. I could not figure out how and exactly when the “./m run” looks at the new code in src and when it needs a “./m dist” to get new code – so I pretty much do a “./m dist” on every modificiation.

When everything works and the output you see from “./m run” matches the output of running Python on the same code you can turn your test code into a new unit test. Run

./m nrt

It brings up vi in a file that is the next available unit test number. Paste in the code form your “rnd.py” and save it. Then run:

./m regentests

Then run

./m
./m dist

You may find little things in each of these steps. Edit your code and/or the unit test until “./m dist is completely clean”. Then I actually copy the two files in dist into my online autograder and do a quick test of the new feature in the browser.

If all goes well, you can use mercurial to add the unit tests, checking things are OK and then do a commit and push

hg add test/run/*322* (do this for each unit test you have added)
hg status
hg commit
hg push

Changing the Language

If you need to change the language (i.e. anything other than the runtime) it is a little trickier. Examples of two language changes I did were:

  • Change the code generator for try / except – this was relatively straightforward because it did not entail a grammar change
  • Add support for quit and exit – I initially thought I could do this by extending the run-time and having them throw an exception that the outer execution loop would catch – but somehow I never got it to work so I switched to making them part of the language like break, continue, and other flow operations. If you look at the code, I touched a lot more files in this change – but it should serve as a nice roadmap when you make a grammar change and then have to work through and get all the parsing and code generation to work.

The steps I take when making any change to the parser are as follows:

./m regenparser
./m
./m dist
./m runopt quit1.py

Again, I don’t know which changes need which of the above steps, but it seems that a lot of the changes needed to do a complete “./m dist” before I could test them in my own code – so after a while – I just did them all on every change.

The first thing you need to do is get the dump of the generated JavaScript code as part of your testing. I searched vainly for a nice option to make this happen and perhaps there is a better way – but I found that what worked for me was un-commmenting some code in “src/import.js”:

--- a/src/import.js	Fri Jan 18 11:03:55 2013 -0500
+++ b/src/import.js	Fri Jan 18 12:20:25 2013 -0500
@@ -84,7 +84,7 @@
  */
 Sk.importModuleInternal_ = function(name, dumpJS, modname, suppliedPyBody)
 {
-    //dumpJS = true;
+    dumpJS = true;
     Sk.importSetUpPath();
 
     // if no module name override, supplied, use default name
@@ -170,7 +170,7 @@
                 return lines.join("\n");
             };
             finalcode = withLineNumbers(co.code);
-//          Sk.debugout(finalcode);
+            Sk.debugout(finalcode);
         }
     }

Make sure not to check these changes in by doing an “hg revert src/import.js” right before the commit and push.

If you make these changes to src/import.js go throught the steps above and you will see a lot of nicely formatted JavaScript flying by in addition to the other output.

Once your have the changes to skulpt making it past “./m dist” it is time to test your own code and the new feature.
when you do a “./m runopt file.py” – you get a lot of Javascript output on the terminal. It is a little obtuse – but like the displays in the Matrix – after a while it makes sense. The basic runtime is a while containing a switch statement and each of the code blocks is a case statement. It is like the classic code generator I wrote in 1979. Don’t expect the blocks to be in the same order as the Python source – just look at the “$blk=4″ code at the end of each block to see where the code will be going next.

Here is the generated JavaScript from a simple hello world Python program with a few line breaks:

-----
print "Hello world"

-----
/*     1 */ var $scope0=(function($modname){var $blk=0,$exc=[],$gbl={},$loc=$gbl;
    $gbl.__name__=$modname;
    while(true){try{ switch($blk){case 0: /* --- module entry --- */
/*     2 */ //
/*     3 */ // line 1:
/*     4 */ // print "Hello world"
/*     5 */ // ^
/*     6 */ //
/*     7 */ 
/*     8 */ Sk.currLineNo = 1;
/*     9 */ Sk.currColNo = 0
/*    10 */ 
/*    11 */ 
/*    12 */ Sk.currFilename = './hello.py';
/*    13 */ 
/*    14 */ var $str1=new Sk.builtins['str']('Hello world');
    Sk.misceval.print_(new Sk.builtins['str']($str1).v);
    Sk.misceval.print_("\n");return $loc;goog.asserts.fail('unterminated block');} }
    catch(err){if ($exc.length>0) { $blk=$exc.pop(); continue; } else { throw err; }} }});

Hello world
-----

Here is code generated from a more complex Python example with more than one block. I wish I knew how to make the JavaScript prettier when debugging your code. The JavaScript is pretty in the unit tests – but ugly when you do runopt.

I won’t go through the detailed code modification steps – that is best shown looking at the diffs from my two changes above.

Pulling in merges from other clones

The Skulpt project is pretty slow-moving so interesting things happen in clones other than the main repo – so it is helpful to pull those changes into your repo. I include how I did this just to help jog my own memory.

Make sure you have any of your changes fully committed and your local repo is clean before you start:

hg incoming https://code.google.com/r/theajp01-skulpt-int-fix/
hg pull https://code.google.com/r/theajp01-skulpt-int-fix/
hg status
hg heads
hg merge
hg diff
./m 
./m dist

If you have a problem with the patches you may need to fix them by editing the files or even add new unit tests using “./m nrt”. When you are satisfied with the patches you do the following:

hg status
hg commit
hg push

Summary

In short it has been a fun three days, re-learning how compilers work internally. I really like the internal structure of the Skulpt project. It is very impressive and thorough and surprisingly easy to work in. This experience also reinforces my sense of the value of very deep learning needed in a Computer Science degree. Some might say that Computer Science students don’t need to learn operating systems or compilers or hardware – but someone needs to be able to dig into these pretty layers and make something work sooner or later.

Of course not everyone who should learn to program needs to be a trained Computer Scientist. There are plenty of people who need just to know how Python and a few other things works so they can sling data around and connect between things. But it is good to be able to call in a plumber once in a while. And for me, it was fun to go back to my plumber days these past three days.

Thanks to the great folks who built Skulpt and thanks to my SI301 on-campus students and Python MOOC students for their patience as I worked through this code as the autograder kept breaking in mysterious ways :).

One Comment

  1. 0xdabbad00 says:

    Really helpful to see how you made your changes. Thank you.

    Looks like skulpt just moved to github on Jan 23.

Leave a Reply

*