Twisted Isn't Specific

I was googling for some stuff related to Twisted and greenlet and I came across this post from last year by Jean-Paul Calderone: Twisted Isn't Specific. That subject is a reference to a quote in the Twisted.Quotes file:

<sayke> Acapnotic: don't make it twisted-specific
<dash> sayke: pffft
<dash> sayke: twisted isn't specific

The post is kind of relevant to some poking around I've been doing with integrating greenlet and Twisted again, but that's not really why I wanted to point it out on my blog. I just had a really good time reading it again. I think it's really well-written, and is fairly quotable. Just to warn you: this is one of those "quote posts" that contain more quotations than commentary, often perpetrated by ineffectual bloggers with nothing original to say. I'm sorry.

The email that Jean-Paul is responding to is by Andrew Dalke, and is about how he wants to magically replace the built-in Python socket module with something that switches to an event loop and allows other application code to run at the same time. This isn't terribly reliable, of course, and Jean-Paul gives a nice explanation of the dangers of implicit context switching (which applies to pre-emptive threading just as well as the controlled but still implicit switching of stackless or greenlet).
Consider this extremely trivial example:
  x = 0
def foo(conn):
global x
a = x + 1
b = ord(conn.recv(1))
x = a + b
return x
Clearly, foo is not threadsafe. Global mutable state is a terrible, terrible thing. The point to note is that by introducing a context switch at the conn.recv(1) call, the same effect is achieved as by any other context switch: it becomes possible for foo to return an inconsistent result or otherwise corrupt its own state if another piece of code violates its assumptions and changes x while it is waiting for the recv call to complete.
Asyncore was also brought up. It's often compared to Twisted, since in the past it was the only prevalent event system for Python. Twisted has pretty much obsoleted it by now, or so I like to think:
2) asyncore is smaller and easier to understand than Twisted, [-- Dalke]

While I hear this a lot, applications written with Twisted _are_ shorter and contain less irrelevant noise in the form of boilerplate than the equivalent asyncore programs. This may not mean that Twisted programs are easier to understand, but it is at least an objectively measurable metric.

Andrew goes on to talk about how great it would be if we could just use the existing built-in Python libraries for NNTP and POP3. That was pretty amusing, because those were some of the first protocol implementations (client and server) that Jean-Paul did for Twisted. And you know what? He had a reason for doing them with Twisted, instead of using the standard library versions:
Yet by using the Stackless socket monkeypatch, this same code works in an async framework. And the underlying libraries have a much larger developer base than Twisted. Want NNTP? "import nntplib" Want POP3? "import poplib" Plenty of documentation about them too. [-- Dalke]

This is going to come out pretty harshly, for which I can only apologize in advance, but it bears mention. The quality of protocol implementations in the standard library is bad. As in "not good". Twisted's NNTP support is better (even if I do say so myself - despite only having been working on by myself, when I knew almost nothing about Twisted, and having essentially never been touched since). Twisted's POP3 support is fantastically awesome. Next to imaplib, twisted.mail.imap4 is a sparkling diamond. And each of these implements the server end of the protocol as well: you won't find that in the standard library for almost any protocol.

He went on to compare the Twisted POP3 client documentation to the Python poplib. Neither were very good.

Jean-Paul even had a nice bit of wisdom about our culture as programmers in general:
I feel that using the phrase "just a" in the previously quoted text is an understatement. [-- Dalke]

I think you're right. We throw around "just" a lot in our line of work, don't we? :) Twisted does also account for a raft of platform-specific quirks and inconsistencies. I take this to be a good thing.

And what a classy closing.
I apologize for writing such a long message, but I didn't have time to write a shorter one.

7 comments:

Richard Tew said...

While Jean-Paul's "trvial example" describes a possible problem, it is just that. No-one ever does that. I have been working with Stackless as a full time job for over seven years and I have never seen anyone do it, not even trainee programmers.

It is just bad code. I would suggest that a programmer who would write it, would be equally likely to write code of similar quality with whatever framework they used.

It shouldn't be taken seriously as an argument against implicit switching.

Allen Short said...

Well, it's an example of code nobody would write, certainly -- but hopefully it gets the point across. Code run in an environment with implicit context switching has to be thread-safe. It's going to fail mysteriously (meaning: in a way not directly observable with a debugger) if it isn't. Cooperative multithreading is slightly less bad than preemptive, but it suffers the same class of problem.

Christopher Armstrong said...

Richard, I agree that the example is indeed trivial and very unlikely, but I think you underestimate the need for forethought when writing thread-safe code. The example, as he stated in the email, was just an illustration of the mechanism by which bugs can occur.

If someone is *assuming* that their code is going to be called re-entrantly, then it's certainly possible to code things such that it won't be a problem, and I'm even willing to admit that cooperative threading means that it's easier to deal with than when using pre-emptive threading.

The issue is that with implicit context switching, arbitrary code that doesn't *expect* to be called reentrantly can quite easily have exactly that happen to it. For example, urllib2 was brought up on the mailing list. It's quite common among the Python co-routine community to monkeypatch the socket module to allow traditionally blocking network libraries to be concurrent. But these libraries were never expecting to be called concurrently, and it's possible and very easy to invoke them in a way that will break if you *actually* use them concurrently.

So, again, that particular example was not the argument against implicit switching, it was merely an illustration. I implore you to read Jean-Paul's original article; he does a good job at explaining it thoroughly.

Andrew Dalke said...

It's quite common among the Python co-routine community to monkeypatch the socket module

I only know the Stackless community. As far as I know, I introduced that as an interesting hack, and I've mentioned it as a proof-of-concept. I am opposed to monkeypatching except in very limited cases.

The reason of course is as you say- the libraries weren't designed to be used that way.

I don't know of anyone actually using that approach, so I'm surprised to see that "it's quite common." References?

Twisted has pretty much obsoleted [asyncore] by now

What you didn't note is that Jean-Paul Calderone's reply wasn't a response to what I wrote. I said "asyncore is smaller and easier to understand than Twisted", and nothing about applications written using Twisted vs asyncore.

This is roughly equivalent to what was one of the differences between Minix and Linux. It was easier to understand Minux when learning how unix-like OSes work.

Feel free to rewrite my Stackless socket monkeypatch using Twisted instead of asyncore. I would like to see that in action, and if the result is smaller and easier to understand then I will gladly use it as an example of how one might emulate blocking I/O using a coroutine.

I'm sure this isn't going to happen. The consensus in the Twisted community appears to be that Stackless-type approaches are fundamentally flawed and not worth investing more time in.

how great it would be if we could just use the existing built-in Python libraries for NNTP and POP3

What would be great is if the improved libraries were part of standard Python and people didn't need to do major rewrites to switch from blocking to async style. Twisted won't be part of the standard library and there's no simple migration path.

And what a classy closing

I didn't and don't consider comparing oneself directly to Blaise Pascal as classy as you do. To me it sounds immodest.

Christopher Armstrong said...

Hi Andrew, thanks for stopping by to post a comment on my blog. I'm sad that you had to end up using ad hominem remarks, but they were minor enough that I'm not terribly bothered.

I think a couple of things you mentioned in your comment are worth responding to, and I think you'll be happy with my responses.

Firstly, you asked for a reference to actual use of socket monkeypatching. For that I point you to Eventlet, which has socket monkeypatching and is in actual deployment at Linden Lab in some of the back-end parts of SecondLife. It uses the socket monkeypatching in both its http server and client, which use BaseHTTPServer and httplib, respectively.

Then you said something about Twisted not making it into the standard library. I refer you to recent discussions on python-dev about exactly the opposite of that :-)

Then you said something about Twisted people generally being against the whole stackless concurrency thing. That may be true, but I think it's funny that you posted it response to me. I was the first person to integrate Twisted with both stackless and greenlet. I also wrote the original versions of deferredGenerator and a prototype that eventually became inlineCallbacks.

I've also been working on some new integration with coroutines over the last few days that's much deeper than those hacks I previously released. I don't want to announce too much about it yet, but it's around if you know where to look.

I hope you didn't think my original post was an attack on you or coroutines or anything. It was purely done in appreciation of some thoughtful and well-written words in an email, and was not meant to cast any negative light on you or anybody else. I'm sad you felt the need to make that comment about immodesty.

Andrew Dalke said...

I'm sad that you had to end up using ad hominem remarks,

As compared to the straw man argument Richard called you on? Anywho ...

Eventlet

I learned about that at PyCon last week. Haven't looked at the code yet. Just did. The magic function is "wrap_socket_with_coroutine_socket" for those reading along at home.

Thanks for the pointer. I'm surprised they did that. I assume it went through enough testing to make sure it works for their specific needs. I'm still worried about the more general cases, like the use of caching in urllib. There's definitely problems if a library is imported and grabs socket.socket before the monkeypatching is done. So I wouldn't use it for general consumption.

you said something about Twisted not making it into the standard library. I refer you to recent discussions on python-dev about exactly the opposite of that

You mean the dicussion where some people want a part of Twisted in the core, others don't see the need, and no one with commit authority on the topic has yet spoken up on it? What does that show?

Is Twisted already available as a "Twisted core", which would be the part to be added to the stdlib, and a "Twisted complete" which is everything else? Experience with that seems a necessary precondition before adding it to the Python stdlib.

Then you said something about Twisted people generally being against the whole stackless concurrency thing. That may be true, but I think it's funny that you posted it response to me.

Glad I could provide the humor, but I don't know you. I said that I would like to see the example Stackless code I wrote written using asyncore rewritten to use Twisted. The statement you quoted implied that that rewrite would be shorter. I would like to see it. I then expressed and elaborated my doubts that you or others in the Twisted community would actually do it. Technically I'm doing that to prod you to write the code because it's not something I've had the time to figure out for myself.

I hope you didn't think my original post was an attack on you or coroutines or anything.

You started off saying that I wanted things to work by magic. I don't. It'll be a lot of work to make stackless sockets drop in as a replacement for the socket library, as used in the standard library, and even more to update all the libraries which don't expect context switches during I/O calls. What I did was a proof-of-concept hack.

I want there to be a shallow migration path from standard lib (as you pointed out, limited, but the POP3 code is good enough for my needs) to something which is better. There isn't one with Twisted alone. Is there one with Twisted+coroutines?

Finally you, like Jean-Paul, misinterpreted my comment about "just a". I see it as a term used to belittle those who don't understand Twisted. You and he see it as a simple API which abstracts the complexity away from the user. The allusion to Pascal in that context therefore gets interpreted (at least by me) as "we're the smart ones so don't you worry your pretty little heads about it."

At this point you'll cry ad hominem again. I'll respond with "groupthink," and we can start looking up web pages describing fallacies and logical biases to our heart's content. ;)

Andrew Francis said...

Hello Folks:

Some comments:

While Jean-Paul's "trivial example" describes a possible problem, it is just that.

I looked at the code example. I think the problem occurs but in reality it is more subtle - i.e, rather than globals, one gets method calls to a shared object that happen to straddle a context switch.

moral of the story - know what constitutes a context switch.

I feel there is another problem that is architectural in nature. That is the "high level" logic is being mixed in with low level networking. I feel that this makes the code more vulnerable to stuff like race conditions.

For complex stuff (i.e., "business logic") - a solution is operating system style layering - objects at layer n + 1, don't know about or can talk to objects a layer above. I think Stackless/generators facilitate the layering process by hiding synchrony issues.

I'm sure this isn't going to happen. The consensus in the Twisted community appears to be that Stackless-type approaches are fundamentally flawed and not worth investing more time in.

I don't know what constitutes a community. However I use Stackless and Twisted and I participate in the respective mailing lists. I am trying really hard to understand the respective philosophies of Stackless, Twisted, and soon, PyPy.

I have invested significant amounts of time in understanding the issues concerning integrating Stackless with Twisted. I haven't figured out everything but I am optimistic - I think investing time integrating co-routines with Twisted will allow the two to easily tackle problems that right now are in the domain of Java or C#.

Cheers,
Andrew