MySQL 5.4 gets bigger anyway, encroaching on new parent Oracle's turf
With the specter of now being owned by a former competitor looming over it, MySQL's community proceeds with a plan to compete with its newly acquired parent.
When Oracle CEO Larry Ellison announced his acquisition of Sun Microsystems yesterday morning, he didn't mention MySQL at all -- his company's principal competitor in the small systems database space. Maybe that was just for spite: It's no secret that Ellison wanted MySQL; he said so explicitly three years ago. It was one of the key missing elements in the top-to-bottom stack he's been looking for, a way to create a line-up of pre-configured systems with everything customers need right out of the proverbial "box."
But MySQL's place in Ellison's stack doesn't extend to the enterprise, where the Oracle DB still rules -- at least in his mind. Eleven million installed MySQL customers plus a resurgent Microsoft SQL Server aside, Oracle DB is, from Oracle's perspective, an unstoppable juggernaut.
While participants in this week's MySQL Conference and Expo in Santa Clara were debating the meaning of changing the flags over the front entrance once again (it was acquired by Sun only last year), the community for the world's principal open source database maintained the course it had set last week. Today, the group heralded the official release of MySQL 5.4, whose principal improvement is bigger and better support for the InnoDB transactional storage engine. That engine will help MySQL enter more enterprises by removing version 5.1's limitation of four cores per instance, moving all the way to 16-processor ("16-way") support for x86 servers with multiple cores per processor, and 64-way support for Sun's SPARC-based CMT servers.
If you remember the days when "Toyota Truck" was an oxymoron in the heavy load division, you know how it feels when barriers are shattered. This puts MySQL into the heavy load category, which isn't exactly inside the boundaries of Larry Ellison's nice little stack.
But Ellison is rarely without an ace up his sleeve, or at least an ace somewhere handy; and in this case, he made sure he had one back in October 2005. That's when Oracle purchased Innobase OY, the makers of the InnoDB database engine. See, MySQL is officially a database management system, which means it's quite capable of managing data stored by other open source engines. While MyISAM is the one designed for MySQL and intended to work with it by default, Innobase developed InnoDB not just for MySQL, but as an open source engine for transactional data. It's through the expansion of MySQL's support for InnoDB that version 5.4's embrace of 16-way servers has come about. Understanding how this particular innovation got started requires us to review a little bit about the ISAM methodology -- specifically, why it's been such a lucky charm for MySQL, up until the point where it needs to expand into the enterprise. For more on that, I'll cite...well, myself, from a textbook I wrote in 1998:
ISAM [Indexed Sequential Access Method] is not another trademark, nor does it represent some proprietary technology invented just for the sake of the cute acronym. Instead, it refers to a technique for locating an entry in a database table. In short, an ISAM driver or server uses a separate table called the index to look up a key number for a record. A key number is a unique entry used to identify that record, such as a serial number or purchase order number. Having found that, the index then points the server in the direction of the true record in the database, thus saving some search time.
ISAM relies on a couple of conditions being met before it can work properly:
* No two records in a table may be identical to one another. If you think about it, no properly conceived database table would have any need for identical records. Even if your table were a catalog of baseball cards and a given collection contained two identical cards, both cards should be given unique identifiers, making their respective records unique.
* At least one column of the database table must contain fields whose contents are unique for each record. Generally, a serial number qualifies as such a column. This column serves to contain the key field that uniquely identifies each record.
For ISAM, at least one separate table is generated for each field column. This table is the index for the database table. It contains two and only two columns: a duplicate of the key field column, and a separate column recording the location of the record in the table whose key field matches the duplicate in the index. The theory here is that because the index table is smaller, it's quicker to search through it than through the main table. But generally, ISAM drivers "cheat" and sort the index column, then employ a binary search instead of a sequential search...which is far faster. So why isn't it called "IBAM" rather than ISAM? Sometimes it's just too difficult to ditch a cute acronym.
Fast-forward eleven years: MyISAM isn't really all that sequential either, which is why it's so fast. But because it's still algorithmic search grafted onto the sequential model, it's not very well adapted to distributed processing. That's okay if you're dealing with a Web-based server where text searches, for example, are executed from one location -- MyISAM is very well suited for that. But for accounting purposes especially, a database architect needs a way for multiple tables to appear to be updated simultaneously, as well as for those updates to be rolled back simultaneously in case of an error.
A common example for a transactional database involves a system where there are multiple bank accounts. If money is transferred from one account to another, from a sequential standpoint -- the "S" in ISAM -- that's two queries: one withdrawal and one credit. But should the power go out on the server in-between those two queries, the withdrawal might disappear. (That's the secret reason why early database writers performed the credit first.) With a transactional database engine, although both queries are written as though they were sequential, the update appears to take place in parallel. In case of an error, the update doesn't appear to have taken place at all, which is better than the alternative.
InnoDB is a transactional engine; MySQL developers can create tables "in" InnoDB instead of "in" MyISAM. When they do, the texture of their programs changes somewhat to take advantage of the transactional model. And in some cases, searches actually slow down because InnoDB has a larger overhead. But the payoff comes in the form of fuller functionality and greater reliability -- and Larry Ellison knew three years ago that such a payoff would be necessary for MySQL's evolution. He said so explicitly, and that's why he purchased Innobase OY.
Now the MySQL community marches forward, with what was a plan to take on Oracle head-on by supporting a tool...owned by Oracle. Now it's a plan to make a bigger place for itself than just a slot in the Ellison stack.
The early numbers for MySQL are dramatic, with easily measurable performance improvements of about 56% in workload tests, and as much as triple the performance in distributed connectivity tests, thanks to that 16-way support. In a December 2008 test of a MySQL 5.4 beta by Sun's Robin Schumacher, the new engine's capacity for subquery optimization -- a smarter way to break down queries nested within queries -- led to speed improvements in subquery benchmarks of as much as 40,000%. That's not a typo; that comma is in the right place.
As MySQL senior software architect Mikael Ronstrom reports on his blog today, "We have consistently seen improvements in the order of 30-40% of sysbench top numbers and on large number of threads 5.4.0 drops much less in performance than 5.1. The new InnoDB Thread Concurrency patch makes the results on high number of threads even more impressive where the results have gone up by another 5-15% at the expense of 1% less on the top results (there are even some DBT2 runs that gave 200% improvement with the new algorithm)."
Version 5.4 would have been released today had Larry Ellison never entered the picture; in which case, our headline might have read, "MySQL gets bigger: Now what will Oracle do?" If making MySQL bigger makes it better, Ellison's 2005 move may be his insurance policy.
With the specter of now being owned by a former competitor looming over it, MySQL's community proceeds with a plan to compete with its newly acquired parent.
When Oracle CEO Larry Ellison announced his acquisition of Sun Microsystems yesterday morning, he didn't mention MySQL at all -- his company's principal competitor in the small systems database space. Maybe that was just for spite: It's no secret that Ellison wanted MySQL; he said so explicitly three years ago. It was one of the key missing elements in the top-to-bottom stack he's been looking for, a way to create a line-up of pre-configured systems with everything customers need right out of the proverbial "box."
But MySQL's place in Ellison's stack doesn't extend to the enterprise, where the Oracle DB still rules -- at least in his mind. Eleven million installed MySQL customers plus a resurgent Microsoft SQL Server aside, Oracle DB is, from Oracle's perspective, an unstoppable juggernaut.
While participants in this week's MySQL Conference and Expo in Santa Clara were debating the meaning of changing the flags over the front entrance once again (it was acquired by Sun only last year), the community for the world's principal open source database maintained the course it had set last week. Today, the group heralded the official release of MySQL 5.4, whose principal improvement is bigger and better support for the InnoDB transactional storage engine. That engine will help MySQL enter more enterprises by removing version 5.1's limitation of four cores per instance, moving all the way to 16-processor ("16-way") support for x86 servers with multiple cores per processor, and 64-way support for Sun's SPARC-based CMT servers.
If you remember the days when "Toyota Truck" was an oxymoron in the heavy load division, you know how it feels when barriers are shattered. This puts MySQL into the heavy load category, which isn't exactly inside the boundaries of Larry Ellison's nice little stack.
But Ellison is rarely without an ace up his sleeve, or at least an ace somewhere handy; and in this case, he made sure he had one back in October 2005. That's when Oracle purchased Innobase OY, the makers of the InnoDB database engine. See, MySQL is officially a database management system, which means it's quite capable of managing data stored by other open source engines. While MyISAM is the one designed for MySQL and intended to work with it by default, Innobase developed InnoDB not just for MySQL, but as an open source engine for transactional data. It's through the expansion of MySQL's support for InnoDB that version 5.4's embrace of 16-way servers has come about. Understanding how this particular innovation got started requires us to review a little bit about the ISAM methodology -- specifically, why it's been such a lucky charm for MySQL, up until the point where it needs to expand into the enterprise. For more on that, I'll cite...well, myself, from a textbook I wrote in 1998:
ISAM [Indexed Sequential Access Method] is not another trademark, nor does it represent some proprietary technology invented just for the sake of the cute acronym. Instead, it refers to a technique for locating an entry in a database table. In short, an ISAM driver or server uses a separate table called the index to look up a key number for a record. A key number is a unique entry used to identify that record, such as a serial number or purchase order number. Having found that, the index then points the server in the direction of the true record in the database, thus saving some search time.
ISAM relies on a couple of conditions being met before it can work properly:
* No two records in a table may be identical to one another. If you think about it, no properly conceived database table would have any need for identical records. Even if your table were a catalog of baseball cards and a given collection contained two identical cards, both cards should be given unique identifiers, making their respective records unique.
* At least one column of the database table must contain fields whose contents are unique for each record. Generally, a serial number qualifies as such a column. This column serves to contain the key field that uniquely identifies each record.
For ISAM, at least one separate table is generated for each field column. This table is the index for the database table. It contains two and only two columns: a duplicate of the key field column, and a separate column recording the location of the record in the table whose key field matches the duplicate in the index. The theory here is that because the index table is smaller, it's quicker to search through it than through the main table. But generally, ISAM drivers "cheat" and sort the index column, then employ a binary search instead of a sequential search...which is far faster. So why isn't it called "IBAM" rather than ISAM? Sometimes it's just too difficult to ditch a cute acronym.
Fast-forward eleven years: MyISAM isn't really all that sequential either, which is why it's so fast. But because it's still algorithmic search grafted onto the sequential model, it's not very well adapted to distributed processing. That's okay if you're dealing with a Web-based server where text searches, for example, are executed from one location -- MyISAM is very well suited for that. But for accounting purposes especially, a database architect needs a way for multiple tables to appear to be updated simultaneously, as well as for those updates to be rolled back simultaneously in case of an error.
A common example for a transactional database involves a system where there are multiple bank accounts. If money is transferred from one account to another, from a sequential standpoint -- the "S" in ISAM -- that's two queries: one withdrawal and one credit. But should the power go out on the server in-between those two queries, the withdrawal might disappear. (That's the secret reason why early database writers performed the credit first.) With a transactional database engine, although both queries are written as though they were sequential, the update appears to take place in parallel. In case of an error, the update doesn't appear to have taken place at all, which is better than the alternative.
InnoDB is a transactional engine; MySQL developers can create tables "in" InnoDB instead of "in" MyISAM. When they do, the texture of their programs changes somewhat to take advantage of the transactional model. And in some cases, searches actually slow down because InnoDB has a larger overhead. But the payoff comes in the form of fuller functionality and greater reliability -- and Larry Ellison knew three years ago that such a payoff would be necessary for MySQL's evolution. He said so explicitly, and that's why he purchased Innobase OY.
Now the MySQL community marches forward, with what was a plan to take on Oracle head-on by supporting a tool...owned by Oracle. Now it's a plan to make a bigger place for itself than just a slot in the Ellison stack.
The early numbers for MySQL are dramatic, with easily measurable performance improvements of about 56% in workload tests, and as much as triple the performance in distributed connectivity tests, thanks to that 16-way support. In a December 2008 test of a MySQL 5.4 beta by Sun's Robin Schumacher, the new engine's capacity for subquery optimization -- a smarter way to break down queries nested within queries -- led to speed improvements in subquery benchmarks of as much as 40,000%. That's not a typo; that comma is in the right place.
As MySQL senior software architect Mikael Ronstrom reports on his blog today, "We have consistently seen improvements in the order of 30-40% of sysbench top numbers and on large number of threads 5.4.0 drops much less in performance than 5.1. The new InnoDB Thread Concurrency patch makes the results on high number of threads even more impressive where the results have gone up by another 5-15% at the expense of 1% less on the top results (there are even some DBT2 runs that gave 200% improvement with the new algorithm)."
Version 5.4 would have been released today had Larry Ellison never entered the picture; in which case, our headline might have read, "MySQL gets bigger: Now what will Oracle do?" If making MySQL bigger makes it better, Ellison's 2005 move may be his insurance policy.
Comment