Functional optimizations with PyPy
So, here's an idea: add a functionality inferencer to PyPy that decides whether some Python code is purely functional (doesn't contain any state-modification operations, doesn't access stuff that isn't passed to it or defined statically enough) -- or if you don't want to do that, just add a notation for declaring some code as purely functional. Then, you can disable the cyclic garbage collector for that code -- you can either just use refcounting, or use something way faster (there are tons of ways to optimize purely functional code, especially in garbage collection, like pointed out in this paper by Joe Armstrong and Robert Virding about real-time garbage collection). You can do this because it's impossible for functional code to create cyclic data structures (unless they use some sort of lazy evaluation, which doesn't really affect these optimizations). So frames created for this function can tell the cyclic garbage collector to go away, and any objects that that function creates don't need to take part in the cyclic garbage collector.
If you think that's crazy, you can also take a look at Use-Once-Variables, an idea for adding "linear objects" to non-linear languages (pretty much all programming languages are non-linear). Use-once-variables refer to linear objects, which can only ever have one reference at a time. Copying and deletion both require explicit support from the linear object. It definitely changes your style of programming to use linear objects, but they can offer some serious improvements to the ability to reason about code that uses them, not to mention that you can optimize the crap out of them.
The intuition behind linear types is the conservation of physical matter--a linear object cannot be created or destroyed without special permission, although it can be moved or transferred at will. Thus, an occurrence of a linear variable--e.g., in the argument list of a function call--indicates that the function will consume the value of the variable. If the function does not later return this value, either as a returned value, or by assigning it to a more global variable, then that function is responsible for properly disposing of any of the resources controlled by the value--even in the case of exceptions. -- Henry Baker, 'Use-Once' Variables
Alright. So PyPy is clearly the platform to implement these ideas on; what parts of the system need to be affected? Is it an object space? How many fundamental modifications does it require?
2 comments:
Well, as usual with PyPy, I'm unsure what level you are operating at :)
The first paragraph seems to be talking about something the interpreter part of PyPy could or should do. I'm not entirely sure that I see the benefits here. Also, you have to be careful; is this function purely functional:
def r(a): return [0]*a
? I think it is by your definition, but you can't be at all sure that the list returned by this function will never be part of a cycle.
>>> l = r(1)
>>> l.append(l)
Another point is that PyPy w/refcounting doesn't do any form of cycle detection yet; if you want sensible memory management use the Boehm or our mark&sweep GC.
The linear objects stuff seems more suited to the translation part of PyPy, and I can't really comment on that as I can't really be bothered to read yet another paper :-) (been reading lots for work in the last few days). Maybe you can add an analysis that detects linear objects and handles them differently, I don't really know.
In general it's probably worth saying that we do try to reason about heap allocated structures and at least to some extent avoid actually allocating them when we can get away with it.
Yeah, I've been thinking about that since I posted. I wonder if there's any way to infer the functionality of *calls* to these potentially-functional functions (and do it at compile time). That could get super involved, maybe.
Anyway, the benefits of knowing when some code is functional is pretty well-documented, mostly in academic papers and implementations of obscure languages like haskell and scheme ;-) The GC optimizations are only one aspect, from what I can tell, although most of them are probably related to memory management. Not to mention the general improvement in the ability to reason.
I admit I have never knowingly seen a language implementation that lets you put the *functional* code in an isolated bubble as opposed to the *non-functional* code (e.g. Haskell's monads). However, I wouldn't be surprised if certain scheme or lisp implementations do this without my knowing it...
As for linear objects, from what I can tell, the major benefit will come when one knows that the reference count of a particular object will not change between two points in the code -- and during that time, memory management for the object can be completely ignored. The "Use Once Variables" described in the paper encourage this by making you annotate the variables/objects that are use-once, and then requiring you to do the appropriate things (like consume the variable in all branches of a conditional, or make sure to return it from your function which accepts it as an argument).
Much more advanced optimizations are explained in that paper, but it's pretty deep.
I guess I need to go bug some Scheme implementors some more.
Post a Comment