MySQL - Troubleshooting Replication
If you have followed the instructions, and your
replication setup is not working, first elliminate the
user error factor by checking the following:
Is the master logging to the binary log? Check with SHOW
MASTER STATUS. If it is, Position will be non-zero. If
not, verify that you have given the master log-bin
option and have set server-id.
Is the slave running? Check with SHOW SLAVE STATUS. The
answer is found in Slave_running column. If not, verify
slave options and check the error log for messages.
If the slave is running, did it establish connection
with the master? Do SHOW PROCESSLIST, find the thread
with system user value in User column and none in the
Host column, and check the State column. If it says
connecting to master, verify the privileges for the
replication user on the master, master host name, your
DNS setup, whether the master is actually running,
whether it is reachable from the slave, and if all that
seems ok, read the error logs.
If the slave was running, but then stopped, look at SHOW
SLAVE STATUS output andcheck the error logs. It usually
happens when some query that succeeded on the master
fails on the slave. This should never happen if you have
taken a proper snapshot of the master, and never modify
the data on the slave outside of the slave thread. If it
does, it is a bug, read below on how to report it.
If a query on that succeeded on the master refuses to
run on the slave, and a full database resync ( the
proper thing to do ) does not seem feasible, try the
First see if there is some stray record in the way.
Understand how it got there, then delete it and run
SLAVE START If the above does not work or does not
apply, try to understand if it would be safe to make the
update manually ( if needed) and then ignore the next
query from the master.
If you have decided you can skip the next query, do SET
SQL_SLAVE_SKIP_COUNTER=1; SLAVE START; to skip a query
that does not use auto_increment, last_insert_id or
timestamp, or SET SQL_SLAVE_SKIP_COUNTER=2; SLAVE START;
otherwise If you are sure the slave started out
perfectly in sync with the master, and no one has
updated the tables involved outside of slave thread,
report the bug, so you will not have to do the above
Make sure you are not running into an old bug by
upgrading to the most recent version.
If all else fails, read the error logs. If they are big,
grep -i slave /path/to/your-log.err on the slave. There
is no generic pattern to search for on the master, as
the only errors it logs are general system errors - if
it can, it will send the error to the slave when things
When you have determined that there is no user error
involved, and replication still either does not work at
all or is unstable, it is time to start working on a bug
report. We need to get as much info as possible from you
to be able to track down the bug. Please do spend some
time and effort preparing a good bug report. Ideally, we
would like to have a test case in the format found in
mysql-test/t/rpl* directory of the source tree. If you
submit a test case like that, you can expect a patch
within a day or two in most cases, although, of course,
you mileage may vary depending on a number of factors.
Second best option is a just program with easily
configurable connection arguments for the master and the
slave that will demonstrate the problem on our systems.
You can write one in Perl or in C, depending on which
language you know better.
If you have one of the above ways to demonstrate the
bug, use mysqlbug to prepare a bug report and send it to
email@example.com. If you have a phantom - a problem
that does occur but you cannot duplicate "at will":
Verify that there is no user error involved. For
example, if you update the slave outside of the slave
thread, the data will be out of sync, and you can have
unique key violations on updates, in which case the
slave thread will stop and wait for you to clean up the
tables manually to bring them in sync.
Run slave with log-slave-updates and log-bin - this will
keep a log of all updates on the slave.
Save all evidence before reseting the replication. If we
have no or only sketchy information, it would take us a
while to track down the problem. The evidence you should
All binary logs on the master
All binary log on the slave
The output of SHOW MASTER STATUS on the master at the
time you have discovered the problem
The output of SHOW SLAVE STATUS on the master at the
time you have discovered the problem
Error logs on the master and on the slave
Use mysqlbinlog to examine the binary logs. The
following should be helpful to find the trouble query,
mysqlbinlog -j pos_from_slave_status /path/to/log_from_slave_status
Once you have collected the evidence on the phantom
problem, try hard to isolate it into a separate test
case first. Then report the problem to firstname.lastname@example.org
with as much info as possible.
Getting Maximum Performance from MySQL
Optimization is a complicated task because it ultimately
requires understanding of the whole system. While it may
be possible to do some local optimizations with small
knowledge of your system/application, the more optimal
you want your system to become the more you will have to
know about it.
So this chapter will try to explain and give some
examples of different ways to optimize MySQL. But
remember that there are always some (increasingly
harder) additional ways to make the system even faster.
MySQL - Optimization Overview
The most important part for getting a system fast is of
course the basic design. You also need to know what
kinds of things your system will be doing, and what your
The most common bottlenecks are:
Disk seeks. It takes time for the disk to find a piece
of data. With modern disks in 1999, the mean time for
this is usually lower than 10ms, so we can in theory do
about 1000 seeks a second. This time improves slowly
with new disks and is very hard to optimize for a single
table. The way to optimize this is to spread the data on
more than one disk. Disk reading/writing. When the disk
is at the correct position we need to read the data.
With modern disks in 1999, one disk delivers something
like 10-20Mb/s. This is easier to optimize than seeks
because you can read in parallel from multiple disks.
CPU cycles. When we have the data in main memory (or if
it already were there) we need to process it to get to
our result. Having small tables compared to the memory
is the most common limiting factor. But then, with small
tables speed is usually not the problem.
Memory bandwidth. When the CPU needs more data than can
fit in the CPU cache the main memory bandwidth becomes a
bottleneck. This is an uncommon bottleneck for most
systems, but one should be aware of it.
Page Numbers : 1