今天其中一台游戏服务器的数据库mysql master当机, 系统变为只读模式,重启后进入安全模式,执行fsck后恢复正常。服务器起来之后mysql启动正常,但一台slave却一直出现同步错误。

登录后查看,发现以下错误:

mysql> show slave status\G*************************** 1. row ***************************               Slave_IO_State:                   Master_Host: 10.90.13.238                  Master_User: slave                  Master_Port: 3306                Connect_Retry: 60              Master_Log_File: mysql-bin.000949          Read_Master_Log_Pos: 277562491               Relay_Log_File: mysql-relay-bin.001616                Relay_Log_Pos: 277562637        Relay_Master_Log_File: mysql-bin.000949             Slave_IO_Running: No            Slave_SQL_Running: Yes              Replicate_Do_DB:           Replicate_Ignore_DB:            Replicate_Do_Table:        Replicate_Ignore_Table:       Replicate_Wild_Do_Table:   Replicate_Wild_Ignore_Table:                    Last_Errno: 0                   Last_Error:                  Skip_Counter: 1          Exec_Master_Log_Pos: 277562491              Relay_Log_Space: 277562836              Until_Condition: None               Until_Log_File:                 Until_Log_Pos: 0           Master_SSL_Allowed: No           Master_SSL_CA_File:            Master_SSL_CA_Path:               Master_SSL_Cert:             Master_SSL_Cipher:                Master_SSL_Key:         Seconds_Behind_Master: NULLMaster_SSL_Verify_Server_Cert: No                Last_IO_Errno: 1236                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from impossible position; the first event 'mysql-bin.000949' at 277562491, the last event read from './mysql-bin.000949' at 4, the last byte read from './mysql-bin.000949' at 4.'               Last_SQL_Errno: 0               Last_SQL_Error:   Replicate_Ignore_Server_Ids:              Master_Server_Id: 41 row in set (0.00 sec)

错误为:

Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from impossible position; the first event 'mysql-bin.000949' at 277562491, the last event read from './mysql-bin.000949' at 4, the last byte read from './mysql-bin.000949' at 4.'

这个错误之前也遇到过,但没有具体记录下来,于是网上找资料。

参考了这几个资料:

出现这样的错误原因很简单,原本的slave在master当机前一直在执行同步的动作,当master当机重启mysql恢复之后,会重新开一个新的binlog继续写,但slave不知道发生了这件事,所以还在问上次同步的那个binlog文件和读到得那个位置。

要确定这个情况,我执行了如下的操作:

1. 检查master的位置

mysql> show master status\G*************************** 1. row ***************************            File: mysql-bin.000950        Position: 336492640    Binlog_Do_DB: Binlog_Ignore_DB: 1 row in set (0.00 sec)mysql> show master status;

2. 检查master上binlog的大小和最新的修改时间:

[root@d1 ~]# ll /data/mysql/mysql-bin.*-rw-rw---- 1 mysql mysql 1073742473 Nov 17 10:38 /data/mysql/mysql-bin.000944-rw-rw---- 1 mysql mysql 1073742022 Nov 18 12:44 /data/mysql/mysql-bin.000945-rw-rw---- 1 mysql mysql 1073745576 Nov 19 15:31 /data/mysql/mysql-bin.000946-rw-rw---- 1 mysql mysql 1073745324 Nov 21 05:03 /data/mysql/mysql-bin.000947-rw-rw---- 1 mysql mysql 1073742027 Nov 22 16:09 /data/mysql/mysql-bin.000948-rw-rw---- 1 mysql mysql  277553623 Nov 23 05:07 /data/mysql/mysql-bin.000949-rw-rw---- 1 mysql mysql  337157571 Nov 23 18:04 /data/mysql/mysql-bin.000950-rw-rw---- 1 mysql mysql        133 Nov 23 08:06 /data/mysql/mysql-bin.index
[root@d1 ~]# du /data/mysql/mysql-bin.* -sh1.1G    /data/mysql/mysql-bin.0009441.1G    /data/mysql/mysql-bin.0009451.1G    /data/mysql/mysql-bin.0009461.1G    /data/mysql/mysql-bin.0009471.1G    /data/mysql/mysql-bin.000948265M    /data/mysql/mysql-bin.000949323M    /data/mysql/mysql-bin.0009504.0K    /data/mysql/mysql-bin.index

从这里可以发现,000949是mysql在系统崩溃的时候最后写过的文件,在恢复之后重新建立了一个新的

000950,从时间和大小的条件可以判断,正常情况下mysql-bin.000949应该会写到1.1G的时候才会重新建立新的文件继续写,现在的情况是服务器宕机导致binlog crash了,所以mysql启动后会重新建立一个新的binlog文件。

3. 在slave上执行如下命令:

mysql> stop slave    -> ;Query OK, 0 rows affected (0.00 sec)mysql> change master to master_host='10.90.13.238', master_user='slave' ,MASTER_PASSWORD='',MASTER_LOG_FILE='mysql-bin.000950',MASTER_LOG_POS=4;Query OK, 0 rows affected (0.09 sec)

就是在mysql上重新指定新的binlog和它的初始位置。然后启动slave:

mysql> start slave;

观察slave启动正常了

mysql> show slave status\G*************************** 1. row ***************************               Slave_IO_State: Waiting for master to send event                  Master_Host: 10.90.13.238                  Master_User: slave                  Master_Port: 3306                Connect_Retry: 60              Master_Log_File: mysql-bin.000950          Read_Master_Log_Pos: 336968550               Relay_Log_File: mysql-relay-bin.000002                Relay_Log_Pos: 52752780        Relay_Master_Log_File: mysql-bin.000950             Slave_IO_Running: Yes            Slave_SQL_Running: Yes              Replicate_Do_DB:           Replicate_Ignore_DB:            Replicate_Do_Table:        Replicate_Ignore_Table:       Replicate_Wild_Do_Table:   Replicate_Wild_Ignore_Table:                    Last_Errno: 0                   Last_Error:                  Skip_Counter: 0          Exec_Master_Log_Pos: 52752634              Relay_Log_Space: 336968852              Until_Condition: None               Until_Log_File:                 Until_Log_Pos: 0           Master_SSL_Allowed: No           Master_SSL_CA_File:            Master_SSL_CA_Path:               Master_SSL_Cert:             Master_SSL_Cipher:                Master_SSL_Key:         Seconds_Behind_Master: 31164Master_SSL_Verify_Server_Cert: No                Last_IO_Errno: 0                Last_IO_Error:                Last_SQL_Errno: 0               Last_SQL_Error:   Replicate_Ignore_Server_Ids:              Master_Server_Id: 41 row in set (0.00 sec)mysql>