Bandwidth via Parallelism, or Composability via Concurrency?

(In an interesting article, Paul Graham remarked: “It would be great if a startup could give us something of the old Moore’s Law back, by writing software that could make a large number of CPUs look to the developer like one very fast CPU. There are several ways to approach this problem. The most ambitious is to try to do it automatically: to write a compiler that will parallelize our code for us. There’s a name for this compiler, the sufficiently smart compiler, and it is a byword for impossibility. But is it really impossible? Is there no configuration of the bits in memory of a present day computer that is this compiler? “)

Cynbe replied in an email that, AFAIK, never got delivered. To fill out the discussion, here it is:

Mostly spot-on, as usual. I do have one comment:

You are close to the mark here, but not quite on it. (I speak as
someone who has been hoeing this garden for several decades, and
hoeing this row for the last five years: I’ve been productizing
the SML/NJ compiler — the last great thing out of Bell Labs before
it Lucented and tanked — under the rubric “Mythryl”.)

The real issue is more about composability via concurrency
than about bandwidth via parallelism.

The latter is not unimportant by any means, but at the end of the
day the overwhelming majority of software will run fine on one modern
CPU (as far as bandwidth goes) and the majority of the remainder will
do so if a handful of hotspots are parallelized by hand at no particularly
great proportional coding effort. There are a few application classes
where CPU is indeed a scarce resource, but they are almost without
exception “embarassingly parallel”. This is something which I think
none of us really expected back in the late 1970s and early 1980s when
parallel computing started receiving serious academic attention.

But the really critical issues are software security, reliability, maintainability,
and development costs in time-to-ship terms — the ones that the
regular doubling of *software* system size is exposing as the critical
achilles heel of the industry.

Mostly-functional programming ala Ocaml, SML and (ahem) Mythryl are
really the only light on the horizon here. I can say this with some
confidence because it takes 15+ years for anything to make it from
the lab to industry, and if you look around there just isn’t anything
else out there ready to make the jump.

Mostly-functional programming is pre-adapted to the multicore/manycore
world because it has only about 1% as many side-effects as conventional
imperative code, which translated directly to only 1% as many race
conditions, hung-lock bugs etc etc: To the extent that the need to
transparently use multiple cores actually does drive the industry as
you project, it will drive the industry to mostly-functional
programming. (Pure-functional? A bridge too far. But let’s not open
that discussion.)
Cheng’s 2001 CMU thesis “Scalable realtime parallel garbage collection
for symmetric multiprocessors — http://mythryl.org/pub/pml/ — shows
how to make this sort of computation transparently scalable to the
multicore/manycore world, and John H Reppy’s recent Manticore work
(http://manticore.cs.uchicago.edu/) shows how to put clean lockless
concurrent programming on top of that to yield just the sort of
automatically-parallel coding system you are thinking of.

But the real significance of all this is not the way it will allow
a few large apps to scale transparently to the upcoming 128-core (etc)
desktop chips, but the way it will let vanilla apps scale to ever-doubling
lines-of-code scales without becoming completely incomprehensible and
unmaintainable.

You have probably noticed that if you ask a hacker how quicksort works
you get back pseudocode on the whiteboard, but if you ask how the
linux kernel works you get boxes and arrows.

This is not an accident; it is because the imperative push-the-program-
counter-around-with-a-stick model of computing simply doesn’t scale.

What does scale is concurrent programming based on coarse-grain
dataflows-between-modules. That is what the boxes-and-arrows on
the whiteboard tell you, and that is what a scalable software
paradigm has to implement.

Reppy’s Manticore CML stuff on top of Cheng’s scalable-realtime-gc
on top of basic mostly-functional programming gives one exactly
that. (Give or take a few obvious fixes like substituting message-queues
for synchronous message-slots.)

That gives it a very real opportunity to move the industry a quantum
leap beyond the 1970+-5 era software technology represented by C++/Java.
And someone a very real opportunity to make a mint, which was your
original point in the post.

Which is why I’ve spent the last five years on that. 🙂

Anyhow, just my $0.02.

Life is Good!

— Cynbe

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.