MySQL - Using Your Own Benchmarks
You should definately benchmark your application and
database to find out where the bottlenecks are. By
fixing it (or by replacing the bottleneck with a 'dummy
module') you can then easily identify the next
bottleneck (and so on). Even if the overall performance
for your application is sufficient, you should at least
make a plan for each bottleneck, and decide how to solve
it if someday you really need the extra performance.
For an example of portable benchmark programs, look at
the MySQL benchmark suite.
You can take any program from this suite and modify it
for your needs. By doing this, you can try different
solutions to your problem and test which is really the
fastest solution for you.
It is very common that some problems only occur when the
system is very heavily loaded. We have had many
customers who contact us when they have a (tested)
system in production and have encountered load problems.
In every one of these cases so far, it has been problems
with basic design (table scans are NOT good at high
load) or OS/Library issues. Most of this would be a LOT
easier to fix if the systems were not already in
To avoid problems like this, you should put some effort
into benchmarking your whole application under the worst
possible load! You can use Sasha's recent hack for this
- super-smack. As the name suggests, it can bring your
system down to its knees if you ask it, so make sure to
use it only on your development systems.
MySQL - Design Choices
MySQL keeps row data and index data in separate files.
Many (almost all) other databases mix row and index data
in the same file. We believe that the MySQL choice is
better for a very wide range of modern systems.
Another way to store the row data is to keep the
information for each column in a separate area (examples
are SDBM and Focus). This will cause a performance hit
for every query that accesses more than one column.
Because this degenerates so quickly when more than one
column is accessed, we believe that this model is not
good for general purpose databases.
The more common case is that the index and data are
stored together (like in Oracle/Sybase et al). In this
case you will find the row information at the leaf page
of the index. The good thing with this layout is that
it, in many cases, depending on how well the index is
cached, saves a disk read. The bad things with this
Table scanning is much slower because you have to read
through the indexes to get at the data.
You can't use only the index table to retrieve data for
You lose a lot of space, as you must duplicate indexes
from the nodes (as you can't store the row in the
Deletes will degenerate the table over time (as indexes
in nodes are usually not updated on delete).
It's harder to cache ONLY the index data.
MySQL Design Limitations/Tradeoffs
Because MySQL uses extremely fast table locking
(multiple readers / single writers) the biggest
remaining problem is a mix of a steady stream of inserts
and slow selects on the same table.
We believe that for a huge number of systems the
extremely fast performance in other cases make this
choice a win. This case is usually also possible to
solve by having multiple copies of the table, but it
takes more effort and hardware.
We are also working on some extensions to solve this
problem for some common application niches.
MySQL - Portability
Because all SQL servers implement different parts of
SQL, it takes work to write portable SQL applications.
For very simple selects/inserts it is very easy, but the
more you need the harder it gets. If you want an
application that is fast with many databases it becomes
To make a complex application portable you need to
choose a number of SQL servers that it should work with.
You can use the MySQL crash-me program/web-page http://www.mysql.com/information/crash-me.php
to find functions, types, and limits you can use with a
selection of database servers. Crash-me now tests far
from everything possible, but it is still comprehensive
with about 450 things tested.
For example, you shouldn't have column names longer than
18 characters if you want to be able to use Informix or
Both the MySQL benchmarks and crash-me programs are very
database-independent. By taking a look at how we have
handled this, you can get a feeling for what you have to
do to write your application database-independent. The
benchmarks themselves can be found in the `sql-bench'
directory in the MySQL source distribution. They are
written in Perl with DBI database interface (which
solves the access part of the problem).
See http://www.mysql.com/information/benchmarks.html for
the results from this benchmark.
As you can see in these results, all databases have some
weak points. That is, they have different design
compromises that lead to different behavior.
If you strive for database independence, you need to get
a good feeling for each SQL server's bottlenecks. MySQL
is VERY fast in retrieving and updating things, but will
have a problem in mixing slow readers/writers on the
same table. Oracle, on the other hand, has a big problem
when you try to access rows that you have recently
updated (until they are flushed to disk). Transaction
databases in general are not very good at generating
summary tables from log tables, as in this case row
locking is almost useless.
To get your application really database-independent, you
need to define an easy extendable interface through
which you manipulate your data. As C++ is available on
most systems, it makes sense to use a C++ classes
interface to the databases.
If you use some specific feature for some database (like
the REPLACE command in MySQL), you should code a method
for the other SQL servers to implement the same feature
(but slower). With MySQL you can use the /*! */ syntax
to add MySQL-specific keywords to a query. The code
inside /**/ will be treated as a comment (ignored) by
most other SQL servers.
If REAL high performance is more important than
exactness, as in some Web applications, a possibility is
to create an application layer that caches all results
to give you even higher performance. By letting old
results 'expire' after a while, you can keep the cache
reasonably fresh. This is quite nice in case of
extremely high load, in which case you can dynamically
increase the cache and set the expire timeout higher
until things get back to normal.
In this case the table creation information should
contain information of the initial size of the cache and
how often the table should normally be refreshed.
What Have We Used MySQL For?
During MySQL initial development, the features of MySQL
were made to fit our largest customer. They handle data
warehousing for a couple of the biggest retailers in
From all stores, we get weekly summaries of all bonus
card transactions, and we are expected to provide useful
information for the store owners to help them find how
their advertisement campaigns are affecting their
The data is quite huge (about 7 million summary
transactions per month), and we have data for 4-10 years
that we need to present to the users. We got weekly
requests from the customers that they want to get
'instant' access to new reports from this data.
We solved this by storing all information per month in
compressed 'transaction' tables. We have a set of simple
macros (script) that generates summary tables grouped by
different criteria (product group, customer id, store
...) from the transaction tables. The reports are Web
pages that are dynamically generated by a small Perl
script that parses a Web page, executes the SQL
statements in it, and inserts the results. We would have
used PHP or mod_perl instead but they were not available
at that time.
For graphical data we wrote a simple tool in C that can
produce GIFs based on the result of a SQL query (with
some processing of the result). This is also dynamically
executed from the Perl script that parses the HTML
In most cases a new report can simply be done by copying
an existing script and modifying the SQL query in it. In
some cases, we will need to add more fields to an
existing summary table or generate a new one, but this
is also quite simple, as we keep all transactions tables
on disk. (Currently we have at least 50G of transactions
tables and 200G of other customer data.)
We also let our customers access the summary tables
directly with ODBC so that the advanced users can
themselves experiment with the data.
We haven't had any problems handling this with quite
modest Sun Ultra SPARCstation (2x200 Mhz). We recently
upgraded one of our servers to a 2 CPU 400 Mhz
UltraSPARC, and we are now planning to start handling
transactions on the product level, which would mean a
ten-fold increase of data. We think we can keep up with
this by just adding more disk to our systems.
We are also experimenting with Intel-Linux to be able to
get more CPU power cheaper. Now that we have the binary
portable database format (new in Version 3.23), we will
start to use this for some parts of the application.
Our initial feelings are that Linux will perform much
better on low-to-medium load and Solaris will perform
better when you start to get a high load because of
extreme disk IO, but we don't yet have anything
conclusive about this. After some discussion with a
Linux Kernel developer, this might be a side effect of
Linux giving so much resources to the batch job that the
interactive performance gets very low. This makes the
machine feel very slow and unresponsive while big
batches are going. Hopefully this will be better handled
in future Linux Kernels.
What is the difference between mysql_fetch_array and
mysql_fetch_array — Fetch a result row as an associative
ARRAY, a numeric array, or both
mysql_fetch_object — Fetch a result row as an OBJECT
What are the different table present in MYsql?
MyISAM: This is default. Based on Indexed Sequntial
Access Method. The above SQL will create a MyISA table.
ISAM : same
HEAP : Fast data access, but will loose data if there is
a crash. Cannot have BLOB, TEXT & AUTO INCRIMENT fields
BDB : Supports Transactions using COMMIT & ROLLBACK.
Slower that others.
InoDB : same as BDB
Page Numbers : 1