Dear Guido,

I really enjoy the Python programming language. Coming from a C/C++/Java world, I have found new levels of productivity with it. Executable pseudo code, great library support, dynamic typing, emphasis on well structured and readable code – this is all very much to my liking. We use it here at SnapLogic to great effect, and many other organizations have similarly positive experiences with it. Thank you for giving us Python!

Sadly, there is one aspect of the language, which is beginning to bother me more and more. And I know I’m not the first one to point it out. A strange design choice, which leaves the Python interpreter – by design – crippled on modern hardware. You know what I’m talking about, Guido, don’t you? That’s right, I’m talking about the GIL, the Global Interpreter Lock. And as I am about to start rambling on here, please don’t take offense Guido, because none is meant.

For those who are not familiar with the issue: The GIL is a single lock inside of the Python interpreter, which effectively prevents multiple threads from being executed in parallel, even on multi-core or multi-CPU systems! You can find more information here. But just to quote the essentials from that page:

In order to support multi-threaded Python programs, there’s a global lock that must be held by the current thread before it can safely access Python objects. … only the thread that has acquired the global interpreter lock may operate on Python objects or call Python/C API functions.

Effectively, this means that all access to Python objects is serialized, no matter how many threads you have in your program, and no matter how many CPUs or cores you have in your hardware! Python has a really easy to use threading API, which makes multi-threaded programming quite painless. Sadly, the Python interpreter itself makes it impossible for those threads to properly take advantage of the hardware which is common these days.

Hardware vendors, such as Intel and AMD, have long recognized that the only way to continue to move CPU speeds forward is the increased parallelization, and have added more and more cores to their CPUs. You acknowledge yourself, Guido, multiple cores become common even in laptops. Here is what you wrote about all of this in your Python 3000 FAQ:

Q. Multi-core processors will be standard even on laptops in the near future. Is Python 3.0 going to get rid of the GIL (Global Interpreter Lock) in order to be able to benefit from this feature?

A. No. We’re not changing the CPython implementation much. Getting rid of the GIL would be a massive rewrite of the interpreter because all the internal data structures (and the reference counting operations) would have to be made thread-safe. This was tried once before (in the late ’90s by Greg Stein) and the resulting interpreter ran twice as slow. If you have multiple CPUs and you want to use them all, fork off as many processes as you have CPUs. (You write your web application to be easily scalable, don’t you? So if you can run several copies on different boxes it should be trivial to run several copies on the same box as well.) If you really want “true” multi-threading for Python, use Jython or IronPython; the JVM and the CLR do support multi-CPU threads. Of course, be prepared for deadlocks, live-locks, race conditions, and all the other nuisances that come with multi-threaded code.

That’s it? Guido, maybe your time at Google is influencing the way you see the world in strange ways. But I can assure you: There are plenty of programs written, which are not designed or intended to run on multiple boxes. Yes, we all want our applications to be scalable, but guess what? Today’s hardware supports that through the presence of multiple cores and CPUs in a single box! And there is a well-established paradigm to take advantage of this: Use multi threading. Alas, not so with Python.

In another discussion about the same topic, you say:

…you just have to undo the brainwashing you got from Windows and Java proponents who seem to consider threads as the only way to approach concurrent activities.

Just because Java was once aimed at a set-top box OS that didn’t support multiple address spaces, and just because process creation in Windows used to be slow as a dog, doesn’t mean that multiple processes (with judicious use of IPC) aren’t a much better approach to writing apps for multi-CPU boxes than threads.

Just Say No to the combined evils of locking, deadlocks, lock granularity, livelocks, nondeterminism and race conditions.

Right, well maybe there are some of us out here, which have done a lot of multi-threading programming before? Maybe there are people who are willing to take on those ‘evils’? Leaving possible incompatibilities between different operating systems aside, which may complicate the creation of new processes on the fly, using IPC has its own problems:

  • What are we going to use? Pipes? That only works on a single system, and not across multiple boxes.
  • Sockets maybe? Then we have to maintain port numbers, establish clients and servers, and deal with the overhead of this, even if all our processes run on the same box. I know it’s optimized, but still.
  • We now need a ‘shared nothing’ architecture. I mean, it’s nice to have a shared nothing architecture. It has a bunch of advantages. Sadly, this is just not always feasible. Maybe at Google it magically is, but in the world outside of Google, it is not always the case. One reason is that message passing is really not always the most effective means of doing things…
  • If you can’t share data structures between threads/processes, you need to send messages to and fro, which describe what it is you want to do, and possibly also send copies of the data with it that you want to work on. Of course, we now need to have message handlers, which adds complexity to the code. Also, we have copy overhead: Data needs to be copied from the ‘real’ data structure into a message buffer, possibly needs to be marshaled somehow, and sent across, unpacked, and so forth. Python is not very fast when it comes to object creation, but there are some objects created right here, just for that message. Plus the copying that needs to be done, which further ads to the overhead.

Many years ago, I worked at nCUBE. We built massively parallel super computers. No shared memory, pure message passing between the (thousands) of CPUs. For some applications this worked very well indeed. For others, it didn’t. Some data structures were much more naturally shared. Sometimes you could partition them, and have individual processors work on separate pieces of them. In the end, though, such approach would still require unnatural contortions in many cases.

Also, I quite simply disagree with your statement:

[This] doesn’t mean that multiple processes (with judicious use of IPC) aren’t a much better approach to writing apps for multi-CPU boxes than threads.

No, I think having to resort to multi-processing – rather than multi-threading – is definitely the much worse approach. It’s nothing more than a nasty hack to be able to take advantage of multiple CPUs, and simply shouldn’t be necessary this day and age.

You see, Guido, if I really want to have a shared nothing system, I can certainly implement that. I could do it with threads, or with processes. But I rather use the threading API. Why?

  • The threading API interface is the same across OSs, the multi-processing ‘API’ (if you can call it that) however can cause issues on different platforms (fork() on Windows?)
  • If I want to send messages between threads, I can simply use queues between threads for communication. All necessary locking is taken care of for me. Those queues come with Python (‘batteries included’) and work everywhere without requiring data to be copied: My messages are simply added as objects into a queue, and thus can be as complex as I want them to be, and don’t have to be marshaled and unmarshaled.

So, even for a shared nothing system, I would much rather use the threading API. And if I really want to share data structures between threads I would be able to do it as well and with the same, simple API. This would give me choices and performance… if, well yes, if the dreaded GIL would just go away.

A word about Jython: Yes, I tried it. How wonderfully easy it is to just create two threads and see them actually run at the same time, taking advantage of my dual-core laptop. And somehow, the speed at which my programs run is comparable, and often faster than the ‘native’ Python 2.5. Strangely, they managed to have the problem solved which you said would lead to such drastic redesign and slowdown. I have to ask then, why not just base Python on the ubiquitous JVM? Just take a look at Ananth’s blog here where he discusses this. The problem with Jython of course is that – by the project’s own admission – the code base is brittle. It is also still stuck at Python 2.2. But I certainly hope they make progress.

Funnily, I just noticed Ananth’s response to my comment on his blog, where he is echoing something that I just wrote here about how being at Google may influence someone’s view of the world. I guess I’m not the only one who thinks this.

In the end, Guido, what it boils down to is this: Please don’t make architectural choices for me. I can make my own. You see, sometimes (just sometimes) your architectural choices and preferences may simply not apply to the problem I am trying to solve. I would like to encourage you to look a little bit beyond the ever expanding rim of the Google universe, and see how your wonderful Python language is used by the rest of us.

I sure hope to see many more wonderful things from Python, and I’m looking forward to Python 3000 when it comes out. And since you already have stated that you won’t change anything about the GIL, I will continue to be frustrated by this, and will be forced to look for other languages from time to time at least, just so that my programs can use the normal hardware of today. And because I like Python so much and care for it, it will continue to annoy me to no end.

Just like I feel strongly about this, I know you feel strongly about the GIL. Still I hope that one day you can see the profound disconnect between this philosophy and the direction in which hardware develops and can move to address this severe shortcoming of an otherwise wonderful language.

Sincerely,

Juergen Brendel

50 Responses to “An open letter to Guido van Rossum: Mr Rossum, tear down that GIL!”

  1. Laurent Szyster

    Dear Juergen,

    You wrote, I quote: “The GIL is a single lock inside of the Python interpreter, which effectively prevents multiple threads from being executed in parallel, even on multi-core or multi-CPU systems!”

    Indeed the GIL prevents the *interpreter* to run two threads of bytecodes concurrently.

    But it allows two or more threadsafe C library to run at the same time.

    The net effect of this brilliant design decision are:

    1. it makes the interpreter simpler and faster

    2. when speed does not matter (ie: bytecode is interpreted) there’s not too much to worry about threads.

    3. when speed does matter (ie: when C code is run) Python applications is not hampered by a brain dead VM that is so [edited] ‘screwed’ up that it must pause to collect its garbage.

  2. Juergen

    Hello Laurent,

    Yes, the interpreter may be simpler. Faster, though? I don’t know. The Java VM tends to be be much faster than whatever Python comes with.

    Also, it’s all well and good that treadsafe C libraries may run at the same time. However, when I write a Python program, I tend to write mostly Python. So, my application code is Python, which has the advantage of being nicely portable, maintainable and all those good things. All I’m asking is that MY code may also please be able to take advantage of multiple CPUs.

  3. gwenhwyfaer

    > In the end, Guido, what it boils down to is this: Please don’t make architectural choices for me. I can make my own.

    But aren’t you telling Guido that you should make an architectural choice (of the Python VM) for him – and worse, one that he then has to implement?

  4. Juergen

    gwenhwyfaer: I know what you mean. I was actually thinking about this when I wrote those lines. I decided to write them anyway, because:

    * It seems to be possible to implement true threading, as shown by the Java VM (although I admit that I am not a VM expert).
    * The hardware vendors have made the choices for all of us already: The future will hold many, many CPU cores for us. If Intel (for example) felt that this should be exploited by multiple processes, then they wouldn’t release special packages and development kits to make fine-grained multi-threading easier and more efficient. They clearly see threads, not processes, as the way forward.
    * As I outlined in the article: You can choose a shared-nothing architecture, based on message passing, or an architecture with shared data, even if you restrict yourself to the threading API alone. If threads would be properly supported, this is all I would need. So, I think that the developers will have more choices, but only with proper threading support. Until proper threading is supported, Guido is making the choice for all of us. Once threading is supported, we can make our own choices.

  5. John Montgomery

    For quite a while I’d been annoyed at the idea of the GIL, but I think I’ve just grown to accept it. It does make life much simpler when coding and using threads. I’ve dealt with some pretty nasty race-conditions, deadlocks and all the usual horrible stuff in Java. In fact it’s only now that Java has the java.util.concurrent package that multi-threading is actually a sane/manageable prospect.

    My only regret now is that there isn’t (yet) a Python module that will let you easily run some stuff truly in parallel. It’d be ideal to maintain the GIL for most code and then have some sort of parallel for loop/map construct that would run several things at once. Though of course, not entirely sure how feasible that would be…

  6. maht

    Lol, I wasn’t aware Python was so damaged. It’s had problems with threads from the start, I remember the “no threads on FreeBSD” days.

    I write multi-threaded programs every day, I’ve never typed “lock” in any of my programs (I use Limbo).

    Perhaps it is time for CPython to be frozen and move on.

  7. she

    I also dont think that being at Google influences a lot of his decisions so I think you guys should stop taking that shot.

    Instead just ask him again why he chose the current GIL approach!

  8. Roger

    Strange… After reading this, I did not come out strongly in favour of removing the GIL, or even strongly against it. For me, a multi core machine means that I can do several different jobs concurrently, and a single program cannot brng my machine to its knees.

    The GIL means that even a very badly behaved Python program cannot sap up all the computing power, only the power of one CPU, so maybe its a good thing…

    If I have a computing intensive multi threaded application that needs lots of raw CPU power, perhaps Python is not the language to choose?

    One last comment about sharing data structures between processes, take a peek at pyro or corba, they both have that problem licked.

  9. arkanes

    There’s a message to this effect probably once a week for the last several years. Not *one* person who’s ever talked about crucial and essential removing the GIL would be has ever stepped up and actually implemented it. If you truly believe that it’s both as easy as you say (it’s not, but I don’t expect you to believe me), then start writing some code. Python source is freely available and if Guido decides to not accept your patches, you can certainly fork and produce your own. If the need is as great as you imagine, such a fork would be monumentally successful.

    The position you take is opposed by pretty much everyone with knowledge of Python internals. Maybe you, like everyone else who posts rants like this, should find out why all of these skilled and experienced programmers don’t agree with what seems so obvious to you.

  10. Ulrich

    It all depends on what you want python to be. If you want to use only pure python for your whole application, it needs to include all ways to solve a problem. But seeing python as a top-level language, with C doing the “ugly work”, it is better to be guided by the design.
    Personally, I never had the need to speed up python code this way.

  11. Mike

    I think that there may be an argument for an alternative to the GIL, but you’re not making it.

    Multiple processes *are* much to be preferred in the cases where this technique is sufficient (which seems to be 99% of the time). Multi-threading is basically a trip back to the late 80s, where the popular OSes of the time simply didn’t have proper memory protection. Been there, done that–no thanks.

  12. Raghav

    It is starkly obvious to me that removing the GIL can only lead to a better Python world.

    Comments like Laurent Szyster’s don’t mean much. You are basically saying, the BDFL is god, the CPython interpreter is simple and that the JVM sucks. So What ???

    I am a person who uses python for doing a lot of mathematical transforms and processing. When I parallelize my algorithms, I get almost no speed up on Python.

    What am I supposed to do ? Write extension modules in “C” with PyEval_ReleaseLock() and PyEval_SaveThread() littered all over the code ? I might as well as write my code in “C” and kiss goodbye to Python.

    Like Juergen says, give me some choice !

  13. Andy

    gwenhwyfaer: Exactly, that is the whole point of open source. Just implement it yourself if you think it’s so darn important.

    People have implemented and are sucessfully maintaining Python patches (e.g. Stackless python). So you can do it. Don’t write a whiny blog post that says nothing that hasn’t been said for the last 10 years, and demands that someone do free work for you. That’s not the point of open source.

  14. Alisic

    Have you ever tried parallel python? It’s a module for python that implements real threading and isn’t affected by the GIL.
    Of course it’s not as nice as having python natively support real parallel computing, but it’s a very solid and lightweight module, an excellent piece of software.

  15. Juergen

    John: Yes, if you have true multi-threading then you might have to be careful with race conditions and such. However, since even now threads can give up the lock (from time to time or when they come across an IO operation) you still need to be aware of this. If you do complex operations on complex data structures, two threads can still definitely step on their toes. So, locking is still necessary anyway.

    maht: Python is not the only language with this problem. Apparently, Ruby suffers from the same.

  16. Juergen

    Roger: You are right, if I have a computing intensive multi threaded then Python perhaps is not the right language. This is why I wrote this for Guido: I really like Python, and I wish that it COULD be the right language even in this case.

    Pyro or Corba don’t solve that problem, however. They just facilitate distributed objects. If two processes need to traverse a compex data structure, at least for one you will end up with a large number of messages being created.

  17. Juergen

    arkanes, andy: ‘Stop whining and write some code’? Please, you know that this is the standard open-source cop-out response, don’t you? I thought we would slowly move past that.

    This is like saying: Constructive criticism is not needed/wanted, because whatever you criticize you should just fix yourself…

    How completely and utterly unrealistic. The open-source world is finally coming around to realize this, so the two of you should as well.

  18. Juergen

    Mike: If multi-processing is so much preferred then please point out to me:

    1. How do I create multiple processes from within my Python application in a platform-independent manner.
    2. Which technique should I use to effectively traverse a single, complex data structure from two such processes without having to send messages or resort to other non-portable means like shared memory.

  19. Andy

    Kevin, re: stackless python:

    No it doesn’t. Stackless lets you create a bunch of “tasklets” (green threads), but only one tasklet runs at a time. You can spread them across threads, but they still all obey the GIL and they’ll only utilize one core.

    If there was real multithreading, then I could see Stackless working great on top of that (it has a lot of nice semantics for safe message passing).

  20. Juergen

    Guido: Thank you for your response. I know and understand your take on it. I hope that someone who knows more about VMs than me – and has the time and inclination to do so – can look into this issue and come up with something that would make all of us happy.

  21. Mikkel Høgh

    @Juergen: Unless you can actually demonstrate or otherwise convince people that implementing that feature you want will not bring about detrimental effects to all of us who have no use for multi-threading, you will probably not get what you want.

  22. Lennart Regebro

    Hang on… isn’t Python open source? Oh, why, don’t you know, it is!

    If Juergen or anybody else gets rid of the GIL without massively slowing down Python, I don’t think anybody would complain. Case closed.

  23. Juergen

    Lennart: It is a fallacy of open-source projects or advocates to assume that ‘if there is a problem, whoever complains should go ahead and fix it’. Sadly, this is very unrealistic. Very few people have the expertise or time to hack a VM. With this argument you are basically saying that any constructive criticism is not necessary (or welcome?) because you may just as well fix it yourself. I don’t think this is the intention of open source and it certainly doesn’t work. Remember: Number-of-users >> Number-of-contributors.

  24. Ezra

    In most cases, the main reason I’ve seen (and had) for the use of threads is simultaneous shared memory access — at least that’s what I’ve run into on python.

    You mention that lack of portability for a shared memory/multiple process solution as a roadblock in your case. Well, in my own search, I recently discovered this that python provides a module called mmap which does exactly that. Yeah, the windows/unix api is different, but it looks like it takes about 5 mins to wrap it appropriately.

    Any reason that wouldn’t suffice? I suppose it would be nice if someone wrote a wrapper for that to make allocating the memory a little less annoying, but that doesn’t look particularly hard for rolling your own either.

  25. Ananth

    I think the spirit of Juergen’s message has been lost in a lot of noise about the GIL. I am ready to accept Guido’s wisdom on threading.

    But given that multicore is the future [1], the Python community needs to be educated on how best to leverage performance from these multi-threaded monsters. Maybe its a new shared-nothing API or Maybe it is some ultra-lightweight IPC framework or Maybe its OpenMP for Python … It just needs to be non-intrusive and it should work.

    If it is already there, we should have been marketing it like mad :-)

    [1] – http://gotw.ca/publications/concurrency-ddj.htm

  26. Juergen

    Mikkel: Guido didn’t say that it is not feasible. In fact, he pointed out that it was done, he was just disappointed by the performance. So, I don’t have to be a VM expert to know that it can be done.

    Also, the JVM and Jython clearly show that it is possible. Apparently, the JVM has a different architecture, and obviously a lot of effort has gone into it to make it performing well.

    It’s just a question about whether the time and effort should also be expended on cPython or not. But it definitely is possible, if one is inclined to do the work.

  27. Juergen

    Ezra: Interesting, I wasn’t familiar with mmap. However, I would have to wonder how synchronization can be accomplished? I mean, what if both processes access the same region of the file at the same time? Such synchronization then has to be built on top either with file-locks or additional message passing.

    Also, as with all multi-process solutions, I now need to add my own process management again.

    Finally, the string/file paradigm of mmap wouldn’t work very well if you have ‘complex’ data structures. Or something very common like a Python dictionary.

  28. Juergen

    Ananth: The voice of reason! :-) I would still say, though, that message passing works well for some applications, and doesn’t work well at all for others. Shared data structures can simplify code and speed up execution. Not all problems are suitable for a message passing approach.

    And even if all the message passing is perfectly camouflaged under a nice API, in the end you still have to live with its inefficiencies, they are just hidden under the hood.

    The threading API gives me the ability to do both: Pass messages between threads or share data structures. Without the GIL, this can be used to the full potential.

  29. Lennart Regebro

    “Very few people have the expertise or time to hack a VM. With this argument you are basically saying that any constructive criticism is not necessary (or welcome?)”

    But funnily enough, all the ones with the expertise have decided that it’s a bad idea.

    It’s not like this debate is new, or anything. The same argument has been going on for years.

    Somebody: “Kill GIL!”
    Guido + Posse: “We trid and it got too slow!”
    Somebody: “Well try another way then!”

    Well, *I* say: You show how it should be done, then. Yes, as you say, constructive criticism is welcome. Just saying “Kill GIL” isn’t constructive, you need to say how it should be done in a way that gives acceptable performance.

    Fixing your own problems is very much in the spirit of open source. What is not in the spirit of open source is in rather powerful language asking that somebody does something that he thinks is a very bad idea, and also gives a hard an technical explanation of why he thinks it’s a bad idea.

    If you want to change this, then you need to show that this is wrong. You need to show that removing the GIL is feasible without a huge reduction in speed.

  30. Juergen

    Lennart: I think the people who have implemented the JVM do have expertise, and they did it somehow. So, yeah, it is possible. I think we all know that it is a lot of effort that’s involved.

    Look, please keep in mind: As I said in the article, I really like Python. I really would like to be able to use it in many more places. I am appealing to Guido (and the Python community) to acknowledge the direction in which the hardware development goes. As I said earlier: Intel releases support libraries for efficient threading, not for multi-processing. I think they know a thing or too about performance, and we shouldn’t dismiss this.

    The appeal here is for Python to embrace this development. Python makes it easy to write powerful programs, but it doesn’t make it easy to take advantage of powerful hardware. That’s a disconnect, which is most unfortunate.

    If cPython doesn’t do this then people will move to Jython or IronPython. Do we want that?

  31. Stephan Deibel

    Sorry, but I need to add my voice to the chorus. I don’t think it makes much sense to tell Guido to do this or that — if you can indeed make it work (and fast), great! If not, then there’s not much sense in bringing this up yet again.

  32. arkanes

    Juergen:

    I think other people said it just as well, but “do it yourself” isn’t just a mindless retort. The people with real expertise have looked at the problem and decided it’s not cost effective. If you don’t agree, it seems perfectly reasonable to say that the burden of implementation lies on you. If it were easy, or even if it were hard but a clear all-around win, it would have been done years ago.

    Python isn’t designed for pure CPU performance. This is a perfectly legitimate design goal – very few programs are CPU bound, and how many cores Intel sticks in its laptops 5 years from now won’t change that. If you do want to write extremely CPU heavy number crunching code in Python, you do have some options, like Pyrex and numeric. Pythons pure number crunching speed is quite low anyway, and any solution you use to get better CPU performance is going to have the side effect of letting you bypass the GIL.

    Jython has been around a long time and never received enough community to keep it an up to date contender with cPython. IronPython is getting a lot more attention, but certainly not overwhelming. This says to me that people don’t really find the GIL to be a problem in actual problems, and that’s certainly born out by my personal experience. If this changes, and people really do find that IronPython and Jython (which, despite running on extremely tuned, well engineered JVMs, aren’t general-case faster than cPython, although IronPython is very close and may match or exceed cPython soon) are that much better, then people will migrate to those implementations. And I don’t see a problem with that.

  33. Jacob Rus

    “With this argument you are basically saying that any constructive criticism is not necessary (or welcome?) because you may just as well fix it yourself.”

    No. He’s saying that your criticism as it stands isn’t constructive. It doesn’t add any new information to a 10+ year-old debate. Currently, the benefits of removing the GIL don’t outweigh the costs (developer time, decreased single-CPU performance) for Python’s developers. If they did, the GIL would have been removed long ago. So you need to ask yourself what is it worth to you. And if removing the GIL is valuable to you, then you should either work on the code yourself, try to convince someone else to work on the code, or put up money for it; this blog post is not the way to bring about change. Talk is cheap.

  34. Juergen

    Jacob: I think there is absolutely nothing wrong with occasionally bringing up the 10 year old discussion again, if it helps spurring constructive discussions (as it has, BTW, in various forums and also in Guido’s blog) in which possible solutions to the problem are being discussed. In that way: Yes, I fully think that my post was constructive. Just because I don’t have a solution doesn’t mean that I can’t express what’s wrong with it, can I?

    If I use a software system, can I not file a bug even though I don’t know the cause? Yes, of course I can. Now you may be saying this is a lame comparison, because I have essentially just filed a duplicate bug. However, you have to see this in the light of the news which is continue to come out (again lately) about new mainstream processors being released by the large CPU vendors, with more CPU cores. And roadmaps calling for even more cores to come very shortly.

    No, I don’t think there is anything non-constructive or unnecessary about occasionally starting this discussion again, as long as Python remains ‘broken by design’ in that respect.

    Take a look over at Guido’s blog and the discussion there ( http://www.artima.com/forums/flat.jsp?forum=106&thread=214235 ) and you will see that this new discussion alone was already very constructive.

    If all my posting did was just to refresh this discussion then I can live quite happily with it and even the criticism that I might get.

    Also, look at the second to last sentence in your posting: One of the options you present there is ‘convince someone else to work on the code’. Well, what do you think I am trying to do here? :-)

  35. Juergen

    wrobell: Yes, double-checking is tricky. In fact, this is the case even in other languages like C++ ( http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html ) and Wikipedia talks about it as well ( http://en.wikipedia.org/wiki/Double-checked_locking ). Note that the Wikipedia article lists Visual C++ as an example, where you can’t just implement it in the naive way either, and have to use the ‘volatile’ keyword. This then can lead to performance issues somewhere else.

    In that case, and considering that the cost of synchronization is not as high as it used to be anymore in modern implementations, I would just go with the solution proposed in Listing 2 of the original article and be done with it.

    Those are all well-known issues now, right?

    Just because multi threading can be complex doesn’t mean that it shouldn’t be used or is the right approach to solve certain problems.

    It depends on your application, but there are plenty of architectures where threads can be used effectively and easily. All I say is: Give me a choice, and don’t force me to use the wrong architecture. The developers of the Java VM have put the effort into it to make true multi-threading possible and efficient for their applications. The developers of cPython have not. It’s as simple as that.

    And because they have not put the effort into it (for a number of reasons) they are now forcing unnatural architectures on me for my Python applications.

  36. Michele Simionato

    Removing the GIL from Python would be a waste of
    effort. We already have better solutions for concurrent programming. Here is an example
    using the processing module:

    ”’
    A script to test if the processing module can take advantage of a multicore CPU
    ”’

    ”’
    Results with a four processors box:

    Running sequential
    Best time of 3: 5.50
    Running threaded
    Best time of 3: 6.97
    Running processed
    Best time of 3: 1.43
    ”’

    import math, random, timeit, threading, Queue, processing

    Ndata = 1000000
    Nproc = 4

    def gendata(n): # some random floating point computation
    x = 42
    for i in xrange(n):
    x = math.sqrt(x)
    yield x

    def compute_norm(numbers):
    norm = math.sqrt(sum(n*n for n in numbers))
    queue.put(norm)

    def sequential():
    global queue
    queue = Queue.Queue()
    for procno in range(Nproc):
    compute_norm(gendata(Ndata))

    def threaded():
    global queue
    queue = Queue.Queue()
    threads = [
    threading.Thread(None, compute_norm, args=(gendata(Ndata),))
    for procno in range(Nproc)]
    for th in threads:
    th.start()
    for th in threads:
    th.join()

    def processed():
    global queue
    queue = processing.Queue()
    processes = [
    processing.Process(None, compute_norm, args=(gendata(Ndata),))
    for procno in range(Nproc)]
    for pr in processes:
    pr.start()
    for th in processes:
    pr.join()

    def time(funcname):
    print ‘Running %s’ % funcname
    t = timeit.Timer(‘%s()’ % funcname, ‘from __main__ import %s’ % funcname)
    min_time = min(t.repeat(3, 1))
    print ‘Best time of 3: %4.2f’ % min_time

    if __name__ == ‘__main__’:
    time(‘sequential’)
    time(‘threaded’)
    time(‘processed’)

  37. Juergen

    Michele: Yeah, that’s great. You just completely confirmed my point: Threads don’t speed up because of the known limitations. But that doesn’t mean that implementing threading would be a waste. If done properly, you’d see a similar speedup, if not better, than with processing.

    But now let’s move on to something a bit more realistic, something that would highlight a typical usage scenario of threading, in which multi-processing generally falls short.

    Show me how this compares when both threads/processes need to traverse a shared data structure, for example. A tree or dictionary perhaps, where random access (read and write) may be required.

    Let me know how ‘processing’ fares then. In fact, let me know how you go about implementing a shared data structure with processing anyway. I’d really like to know.

  38. Michele Simionato

    Why don’t you download ‘processing’ from PyPI and
    see it yourself? You can share dictionaries, lists
    and in general pickeable objects.

  39. David Pokorny

    This entire discussion is getting dragged down by ideology, and it is coming from both sides. I’m inclined to favor Guido because he’s one guy and Java has been historically all of Sun. It takes more than guts to stand up to an entire movement. Nevertheless, both sides have done a terrible job of explaining both the limitations of their approaches and examples where the other approach would be appropriate. Juergen, you appear (to me at least) to have a large library of applications, but the way you talk about them makes them sound…less than concrete. The burden of proof is upon you to demonstrate or describe a concrete situation where CPython – GIL + fine grained locking would be preferable to both CPython AND Java (and everything else for that matter). This is particularly difficult today given that Sun has opened up Java.

    If flexibility in the face of changing business requirements were the driving factor then Java would probably be the language of choice. The enterprise is not Python’s natural habitat (startups, among other places, are). If processing performance were the driving concern then you should call up Yelp.com or Youtube.com and ask how they solved their performance problems (both sites use Python).

    Python is the language of choice for small, cohesive, dedicated groups of exceptionally talented programmers who need to work quickly. Java is the language of choice for disparate groups of average programmers who turn over every few years. With skill, a Python-based system can realize substantial performance gains. With enough money (to buy hardware), a Java-based system can realize the same performance gains. Both languages are flexible in the context of their natural environments.

    As a parting note, UC Berkeley has recommended that IT infrastructure use Ruby on Rails or Java in preference to all other technologies (Perl, Python, PHP, ASP, etc…). If I had to guess, Python was overlooked not because of the GIL but for more a practical reason: Python is so flexible that it is very easy to write very bad and very clever code. This is an avowed shortcoming of Python, but it is particularly noxious to the enterprise or large institution.

  40. Will Kelly

    While working at Google may certainly flavor someone’s thought process, I suspect that Sun is the more influential company when it comes to thread support. Sun elected long ago to put their eggs in the threading basket, and the direction of Java followed the direction of Sun. They’ve focused on threading for scalability across the board. Just take a look at one of their newer processors: http://www.sun.com/processors/UltraSPARC-T1/details.xml. It boasts simultaneous execution of up to 4 threads per core (up to 32 threads per processor).

    As Juergen points out, there are several approaches to scalability–lack of thread support in CPython does not in any way prevent one from writing a scalable python application, but it obviously does make taking advantage of the increased concurrency capability of modern hardware more difficult. If this is a concern, perhaps another language is appropriate. Still, the the future is bright for python users–even those of us who lack the technical skill to remove the GIL ourselves and implement a stunning thread-safe overhaul to CPython with no negative effects whatsoever, as Sun has hired Frank Wierzbicki of Jython fame: http://fwierzbicki.blogspot.com/2008/02/jythons-future-looking-sunny.html

    I would rather the reference implementation folks focus on continuing to provide and improve a beautiful language than get bogged down in threading issues at this time. In the mean time, I will consider horizontal scalability when writing python applications and look forward to some nice jython updates for all my crazy threading needs.

  41. Jamison Prawl

    Thank you for another fantastic post. Exactly where else could anyone get that kind of info in these a perfect way of writing? I have a presentation next week, and I’m on the look for like info.

Trackbacks/Pingbacks

  1.  Michael Tsai - Blog - Mr. Rossum, tear down that GIL!
  2.  jessenoller.com - Interesting Read: Tear Down that GIL!
  3.  Guido on the GIL « Thermal Noise
  4.  Thinking Parallel » Blog Archive » Parallel Programming News for Week 37/2007
  5.  Guido is Right to Leave the GIL in Python, Not for Multicore but for Utility Computing « SmoothSpan Blog
  6.  Creating threads in Python--with example | as through a mirror dimly
  7.  DEBEDb holds forth » Vingt ans après

Leave a Reply

  • (will not be published)