(In an interesting article, Paul Graham remarked: “It would be great if a startup could give us something of the old Moore’s Law back, by writing software that could make a large number of CPUs look to the developer like one very fast CPU. There are several ways to approach this problem. The most ambitious is to try to do it automatically: to write a compiler that will parallelize our code for us. There’s a name for this compiler, the sufficiently smart compiler, and it is a byword for impossibility. But is it really impossible? Is there no configuration of the bits in memory of a present day computer that is this compiler? “)
Cynbe replied in an email that, AFAIK, never got delivered. To fill out the discussion, here it is:
Mostly spot-on, as usual. I do have one comment:
You are close to the mark here, but not quite on it. (I speak as someone who has been hoeing this garden for several decades, and hoeing this row for the last five years: I've been productizing the SML/NJ compiler -- the last great thing out of Bell Labs before it Lucented and tanked -- under the rubric "Mythryl".) The real issue is more about composability via concurrency than about bandwidth via parallelism. The latter is not unimportant by any means, but at the end of the day the overwhelming majority of software will run fine on one modern CPU (as far as bandwidth goes) and the majority of the remainder will do so if a handful of hotspots are parallelized by hand at no particularly great proportional coding effort. There are a few application classes where CPU is indeed a scarce resource, but they are almost without exception "embarassingly parallel". This is something which I think none of us really expected back in the late 1970s and early 1980s when parallel computing started receiving serious academic attention. But the really critical issues are software security, reliability, maintainability, and development costs in time-to-ship terms -- the ones that the regular doubling of *software* system size is exposing as the critical achilles heel of the industry. Mostly-functional programming ala Ocaml, SML and (ahem) Mythryl are really the only light on the horizon here. I can say this with some confidence because it takes 15+ years for anything to make it from the lab to industry, and if you look around there just isn't anything else out there ready to make the jump. Mostly-functional programming is pre-adapted to the multicore/manycore world because it has only about 1% as many side-effects as conventional imperative code, which translated directly to only 1% as many race conditions, hung-lock bugs etc etc: To the extent that the need to transparently use multiple cores actually does drive the industry as you project, it will drive the industry to mostly-functional programming. (Pure-functional? A bridge too far. But let's not open that discussion.) Cheng's 2001 CMU thesis "Scalable realtime parallel garbage collection for symmetric multiprocessors -- http://mythryl.org/pub/pml/ -- shows how to make this sort of computation transparently scalable to the multicore/manycore world, and John H Reppy's recent Manticore work (http://manticore.cs.uchicago.edu/) shows how to put clean lockless concurrent programming on top of that to yield just the sort of automatically-parallel coding system you are thinking of. But the real significance of all this is not the way it will allow a few large apps to scale transparently to the upcoming 128-core (etc) desktop chips, but the way it will let vanilla apps scale to ever-doubling lines-of-code scales without becoming completely incomprehensible and unmaintainable. You have probably noticed that if you ask a hacker how quicksort works you get back pseudocode on the whiteboard, but if you ask how the linux kernel works you get boxes and arrows. This is not an accident; it is because the imperative push-the-program- counter-around-with-a-stick model of computing simply doesn't scale. What does scale is concurrent programming based on coarse-grain dataflows-between-modules. That is what the boxes-and-arrows on the whiteboard tell you, and that is what a scalable software paradigm has to implement. Reppy's Manticore CML stuff on top of Cheng's scalable-realtime-gc on top of basic mostly-functional programming gives one exactly that. (Give or take a few obvious fixes like substituting message-queues for synchronous message-slots.) That gives it a very real opportunity to move the industry a quantum leap beyond the 1970+-5 era software technology represented by C++/Java. And someone a very real opportunity to make a mint, which was your original point in the post. Which is why I've spent the last five years on that.Anyhow, just my $0.02. Life is Good! -- Cynbe
