eGroups was a free provider of email-mailing lists and other web-based group services, which is now known as Yahoo! Groups. eGroups ran the service with MySQL as the primary RDBMS between November of 1998 and March of 2000. In August of 1999, we had over 420,000 groups and 7.4 million users on the system.
I suspected we wouldn't be running the bulk of our service on MySQL forever, and I was right. We eventually switched it to Oracle, with a custom flat-file database copy for some queries. However, I wrote this article back then to document our experience twisting MySQL (with only table-level locking!) into a fairly large configuration. The rest of this document is presented as it was written at the time...
We have several tables, however, there is really one table which dominates our performance. It is called ulists. In ulists, we store subscription data for users to mailing lists. This table is a many-to-many relationship between users and the mailing lists they are on. Each user can be on many mailing lists, and each mailing list can have many users on it. There are three indicies on this table, one on the users, one on the lists, and one joint user-list index. The table is entirely denormalized for performance. This table sees lots of single row updates and inserts, single row selects, and multi-row selects (all the records for a list, qualified by listname). To add to the spice, this table's records also grow in size often, causing lots of fragmentation.
We started with a single MySQL database running the site around November 1998. Shortly after that, a system was setup to do "dirty copies" of the database nightly. Those dirty copies were ISMCHKed and then put online as the "backup" database. This backup database was our "24 hour ago backup" and also served as a read-only database for running statistics and non-critical jobs.
Around Janurary 1999, we started to perform many more long running queries. Because MySQL has only table-level locking, the long running queries blocked up pending updates and inserts. We evaluated using both "high-priority updates" and "high-priority selects" and each led to it's own kind of problems.
Both of these situations was obviously less than optimal. The high-priority select case was less-bad, so we used it. However, we set to work on our first performance solution, real-time replication.
We had to reduce the long-running query load on our database. We resolved to setup a live-backup which we would push all the long-running queries to. As long as the backup stayed relatively in-sync with the live database, our application would be happy.
In late March 1999, we put online our homebrew MySQL live-replication. For quite a while this resolved our issues. We moved long-running queries to point at the backup database. On occasion our backups would get out of date, and the parts of our system dependent on it would be acting on old data. However, for the most part, all was well.
However, even the short read/write query counts were growing extremely fast. We had already upgraded our main database to 2GB of memory. We needed more speed and fast. We considered buying bigger hardware, but we already felt we wanted more mobility and more flexibility. We choose to setup a clustering architecture.
Our cluster was based around the fact that most of our data pertains only to one particular mailing list on our service. We call each partition of data a "pod". Our cluster had one main database (the cluster db) which holds the list of pods and the indirection mapping from a particular list to a particular pod.
You can see some of the database troubles occuring while we worked to test and roll out our pod/cluster capable code in our yearly graph. The large spike in usage in April is correlated with the release of our new user-interface. The new interface caused more queries both because it was initially not as well optimized, and because users began to use the website more as a result of having an improved interface.
In early July 1999 we completed rollout and testing of the pod/cluster code. In August 1999, we dumped all of our database data and reloaded it balanced across the three primary pods. As you can see from the performance graph.
Today, we are running one cluster DB, three primary pods and three secondary pods. Our cluster DB also holds the user tables until we decide to create dedicated user pods. Our cluster database is averaging 110 questions/second.
Each of our pods is currently averaging about 80 questions/sec, and peaking over 300 questions/second. Below is a monthly graph for one of them. The blue line is the graph of the absolute number of long running queries, thus it's the slope which represents the number of long running queries, not the height.
Today's total primary cluster performance is summarized below:
| database | average queries | max queries | cost | cost, $ per QPS |
| cluster db | 110 | 120 | $12,000 | $100 |
| group POD (x3) | 80 | 330 | $12,000 | $36 |
| Total primary cluster performance | 350 | 1110 | $48,000 | $43 |
For quite some time, we had a single backup database for all the primary
databases. This backup received the updates and inserts from all primaries, and
it served requests for long-running queries such as statistics and non-interactive
services. Below the backup database graph is presented and the query performance
is summarized. Remember that some significant number of these queries are actually
updates and inserts propagated from the primary databases. Those queries are not
considered "useful work" of the cluster.
| database | average queries | max queries |
| backup POD | 56 | 188 |
This is a count of the number of rows in each table. Tables marked with an (*) are split across the PODs.
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||