No announcement yet.

FF3.1 to get massive Javascript performance improvement: 20-40x faster

  • Filter
  • Time
  • Show
Clear All
new posts

  • FF3.1 to get massive Javascript performance improvement: 20-40x faster

    Javascript sucks a little less...

    Firefox to get massive JavaScript performance boost

    By Ryan Paul | Published: August 22, 2008 - 02:00PM CT

    Mozilla is leveraging an impressive new optimization technique to bring a big performance boost to the Firefox JavaScript engine. The code was merged today (but is not yet ready to be enabled by default in the nightly builds) and is planned for inclusion in Firefox 3.1, the next incremental update of the open-source web browser.

    I discussed this new optimization strategy with Mozilla's chief evangelist Mike Shaver and Mozilla CTO Brendan Eich, the creator of JavaScript. They are concerned that sophisticated web applications are being held back by the limitations of JavaScript interpreter performance. They aim to improve execution speed so that it is comparable to that of native code. This will redefine the boundaries of client-side performance and enable the development of a whole new generation of more computationally-intensive web applications.

    They are "getting ready to take JavaScript performance into the next tier" with a radically innovative optimization tactic called tracing that has already produced performance improvements ranging between 20 and 40 times faster in some cases. They believe that this is just the beginning of what can be accomplished with tracing, and they expect to be able to achieve even better speed as the work continues.

    The theories behind tracing optimization were pioneered by Dr. Andreas Gal, a research scientist at the University of California, Irvine. The tracing mechanism records the path of execution at runtime and generates compiled code that can be used next time that a particular path is reached. This makes it possible to flatten out loops and nested method calls into a linear stream of instructions that is more conducive to conventional optimization techniques. Tracing optimization is particularly effective in dynamic languages and also has a very light memory footprint relative to alternative approaches.

    Mozilla already incorporated tracing optimization into Tamarin, a next-generation JavaScript runtime engine that leverages Adobe's ActionScript virtual machine. Tamarin, however, still lacks maturity and doesn't yet deliver significant performance gains—partly because Tamarin's ActionScript heritage means that it is optimized for efficient execution of code with type annotations. Tamarin is a long-term project and won't be ready until Firefox 4.

    To get a real-world performance increase right now, Mozilla has adapted the tracing technology and Adobe's nanojit so that they can be integrated directly into SpiderMonkey, the JavaScript interpreter that is used in Firefox 3. This has produced a massive speedup that far surpasses what is currently possible with Tamarin-tracing. In addition to empowering web developers, the optimizations will also improve the general performance of the browser itself and many extensions because many components of the program are coded with JavaScript.

    Bringing more power to client-side scripting will move the web forward and create new opportunities for web developers. Eich says that Mozilla wants to "get people thinking about JavaScript as a more general-purpose language" and show them that "it really is a platform for writing full applications."

    Apple has also been implementing some extremely impressive JavaScript performance improvements with its compelling SquirrelFish virtual machine engine, which will be included in Safari 4. Like Mozilla, Apple says that the current performance gains delivered by SquirrelFish only scratch the surface of what is possible.

    JavaScript isn't just a clumsy solution for client-side form validation anymore. While the implementors are aggressively addressing JavaScript's performance limitations, the ECMAScript standards community is making progress on addressing some of the language's historical syntactic weaknesses. The addition of some rather nice Pythonic sugar in JavaScript 1.7 and 1.8 is a great start, and the recent emergence of consensus in the standards community on the future of ECMAScript lifts some of the roadblocks that had prevented those efforts from going further.

    With so much forward momentum and rapid evolution, JavaScript appears capable of meeting the demand for a more robust web programming platform. As more applications shift into the cloud, these capabilities will be essential for building the future of the web.
    "The issue is there are still many people out there that use religion as a crutch for bigotry and hate. Like Ben."
    Ben Kenobi: "That means I'm doing something right. "

  • #2
    So is tracing just a sophisticated form of JIT?


    • #3
      Not how I understand JIT (which is admittedly sort of low)...
      <Reverend> IRC is just multiplayer notepad.
      I like your SNOOPY POSTER! - While you Wait quote.


      • #4

        Before TraceMonkey, for Firefox 3, we made serious performance improvements to SpiderMonkey, both to its Array code and to its interpreter. The interpreter speedups entailed two major pieces of work:

        * Making bytecode cases in the threaded interpreter even fatter, so the fast cases can stay in the interpreter function.
        * Adding a polymorphic property cache, for addressing properties found in prototype and scope objects quickly, without having to look in each object along the chain.

        I will talk about the property cache and the "shape inference" it is based on in another post.

        By the way, we are not letting moss grow under our interpreter's feet. Dave Mandelin is working on a combination of inline-threading and call-threading that will take interpreter performance up another notch.

        While doing this Firefox 3 work, I was reminded again of the adage:

        Neurosis is doing the same thing over and over again, expecting to get a different result each time.

        But this is exactly what dynamically typed language interpreters must do. Consider the + operator:

        a = b + c;

        Is this string concatenation, or number addition? Without static analysis (generally too costly), we can't know ahead of time. For SpiderMonkey, we have to ask further: if number, can we keep the operands and result in machine integers of some kind?

        Any interpreter will have to cope with unlikely (but allowed) overflow from int to double precision binary floating point, or even change of variable type from number to string. But this is neurotic, because for the vast majority of JS code, in spite of the freedom to mutate type of variable, types are stable. (This stability holds for other dynamic languages including Python.)

        Another insight, which is key to the tracing JIT approach: if you are spending much time in JS, you are probably looping. There's simply not enough straight line code in Firefox's JS, or in a web app, to take that much runtime. Native code may go out to lunch, of course, but if you are spending time in JS, you're either looping or doing recursion.

        The Trace Trees approach to tracing JIT compilation that Andreas pioneered can handle loops and recursion. Everything starts in the interpreter, when TraceMonkey notices a hot loop by keeping cheap count of how often a particular backward jump (or any backward jump) has happened.

        for (var i = 0; i < BIG; i++) {
        // Loop header starts here:
        if (usuallyTrue())

        One a hot loop has been detected, TraceMonkey starts recording a trace. We use the Tamarin Tracing Nanojit to generate low-level intermediate representation instructions specialized from the SpiderMonkey bytecodes, their immediate and incoming stack operands, and the property cache "hit" case fast-lookup information.

        The trace recorder completes when the loop header (see the comment in the code above) is reached by a backward jump. If the trace does not complete this way, the recorder aborts and the interpreter resumes without recording traces.

        Let's suppose the usuallyTrue() function returns true (it could return any truthy, e.g. 1 or "non-empty" -- we can cope). The trace recorder emits a special guard instruction to check that the truthy condition matches, allowing native machine-code trace execution to continue if so. If the condition does not match, the guard exits (so-called "side-exits") the trace, returning to the interpreter at the exact point in the bytecode where the guard was recorded, with all the necessary interpreter state restored.

        If the interpreter sees usuallyTrue() return true, then the commonPath(); case will be traced. After that function has been traced comes the loop update part i++ (which might or might not stay in SpiderMonkey's integer representation depending on the value of BIG -- again we guard). Finally, the condition i < BIG will be recorded as a guard.

        // Loop header starts here:
        inlined usuallyTrue() call, with guards
        guard on truthy return value
        guard that the function being invoked at this point is commonPath
        inlined commonPath() call, with any calls it makes inlined, guarded
        i++ code, with overflow to double guard
        i < BIG condition and loop-edge guard
        jump back to loop header

        Thus tracing is all about speculating that what the interpreter sees is what will happen next time -- that the virtual machine can stop being neurotic.

        And as you can see, tracing JITs can inline method calls easily -- just record the interpreter as it follows a JSOP_CALL instruction into an interpreted function.

        One point about Trace Trees (as opposed to less structured kinds of tracing): you get function inlining without having to build interpreter frames at all, because the trace must reach the loop header in the outer function, or exit (so-called "side-exit") the loop at that conditional jump. Therefore, so long as the JITted code stays "on trace", no interpreter frames need to be built.

        If the commonPath function itself contains a guard that side-exits at runtime, then (and only then) will one or more interpreter frames need to be reconstructed.

        Let's say after some number of iterations, the loop shown above side-exits at the guard for usuallyTrue() because that function returns a falsy value. We abort correctly back to the interpreter, but keep recording in case we can complete another trace back to the same loop header, and extend the first into a trace tree. This allows us to handle different paths through the control flow graph (including inlined functions) under a hot loop.
        "The issue is there are still many people out there that use religion as a crutch for bigotry and hate. Like Ben."
        Ben Kenobi: "That means I'm doing something right. "