Write-set Cache(GCache)

Write-set Cache(GCache)

Galear Cluster将write-sets存储在一个称为Write-set Cache(或称为GCache)的特殊的cache中. GCache cache is a memory allocator for write-sets.主要目的是为了最大限度地减少RAM上的write-setfootprint. Galera集群通过将卸载写入集存储到磁盘来改善此问题.

GCache采用三种类型的存储:

  • Permanent In-Memory Store Here write-sets allocate using the default memory allocator for the operating system. This is useful in systems that have spare RAM. The store has a hard size limit.

    By default it is disabled.

    这个就是说可以用操作系统的内存空间, 对于有空闲内存的系统比较合适. 这个空间大小有一个硬性的限制, 貌似是这个参数gcache.mem_size=0[^1]

    默认不使用内存.

  • Permanent Ring-Buffer File Here write-sets pre-allocate to disk during cache initialization. This is intended as the main write-set store.

    这一个块循环使用的区域, 是写到磁盘上的 , 貌似是这个参数gcache.size = 128M^2

  • On-Demand Page Store 这里写集根据需要在运行时分配给内存映射的页面文件。 默认情况下,其大小为128Mb(gcache.page_size[^3]),但如果需要存储更大的写入集,则可能会更大。页面存储的大小受可用磁盘空间的限制。默认情况下,Galera Cluster会在不使用时删除页面文件,但您可以对要保留的页面文件的总大小设置限制。

    When all other stores are disabled, at least one page file remains present on disk.

Galera集群使用分配算法,尝试按上述顺序存储写入集。也就是说,它首先尝试使用永久性内存存储。如果写入集没有足够的空间,它将尝试存储到永久环形缓冲区文件。页面存储总是成功,除非写入集大于可用磁盘空间。

Galera Cluster uses an allocation algorithm that attempts to store write-sets in the above order. That is, first it attempts to use permanent in-memory store. If there is not enough space for the write-set, it attempts to store to the permanent ring-buffer file. The page store always succeeds, unless the write-set is larger than the available disk space.

默认情况下,写集缓存分配进程工作目录中的文件。您可以使用gcache.dir参数指定写入集缓存的专用位置.

[^1]: Deprecated in 5.6.22-25.8 . This variable has been deprecated and shouldn’t be used as it could cause a node to crash.

image-20180601165051861

当节点接收状态传输时,它们不能处理传入的写集,直到它们完成状态更新。在某些方法下,发送状态转移的节点也被阻塞。为防止数据库进一步落后,GCache将内存映射文件中的传入写入集保存到磁盘。 此参数定义要为当前环形缓冲区存储分配的磁盘空间量。节点在启动数据库服务器时分配此空间。

[^3]: This variable can be used to specify the size of the page files in the page storage.

image-20180601165332896

Want IST Not SST for Node Rejoins? We Have a Solution!

Want IST Not SST for Node Rejoins? We Have a Solution!

Krunal Bauskar | February 13, 2018 | Posted In: High-availability, MySQL, Percona XtraDB Cluster

如果我们告诉你,有一种确定的方法可以让节点rejoin使用IST而不是SST?您可以保证新节点使用IST重新加入. 听起来很有趣?请继续阅读.

通常当一个节点脱离集群一段时间(处于维护目的或就是shutdown了), 集群上其他节点的gcache将用来在前者重新加入集群时提供前者在脱离期间缺失的write-set(s). 如果您配置了较大的gcache, 或downtime足够段, 则此方法可行. 对于生产环境来说, 无论是设置较大的gcache或者缩短停机窗口都不够好.

在停机之前在潜在的DONOR节点上重新配置gcache需要关闭节点.(gcache不能动态调整大小), Restoring it back to original size needs another shutdown. So “three shutdowns” for a single downtime. *No way …… not acceptable with busy production clusters and the possibility of more errors.*

Introducing “gcache.freeze_purge_at_seqno”

基于以上痛点, 我们在Percona XtraDB Cluster 5.7.20引入了gcache.freeze_purge_at_seqno.这将控制清除gcache, 从而在节点重新加入时保留更多的数据以促进IST.

Galera集群世界中的所有事务都被分配了唯一的全局序列号(seqno).跟踪事情发生使用此seqno(如wsrep_last_applied,wsrep_last_committed,wsrep_replicated,wsrep_local_cached_downto等^1).wsrep_local_cached_downto表示gcache已被清除的序列号。假设wsrep_local_cached_downto = N,那么gcache具有来自[N,wsrep_replicated]的数据, 并清除了[1,N)数据。

gcache.freeze_purge_at_seqno takes three values:

1. -1(默认值): no freeze, the purge operates as normal.
2. **x (should be valid seqno in gcache):** freeze purge of write-sets >= x. The best way to select x is to use the wsrep_last_applied value as an indicator from the node that you plan to shut down. (wsrep_applied * 0.09. Retain this extra 10% to trick the [safety gap heuristic algorithm of IST](https://www.percona.com/blog/2017/11/15/understanding-ist-donor-selected/).)
3. **now:** freeze purge of write-sets >= smallest seqno currently in gcache. Instant freeze of gcache-purge. (If tracing x (above) is difficult, simply use “now” and you are good).

在集群的现有节点上进行设置(这将继续作为集群的一部分,并可以充当潜在的捐助者)。该节点继续保留写集,从而允许重启节点使用IST重新加入。 (您可以在重启需要rejoin的节点时通过指定—wsrep_sst_donor将该节点作为首选DONOR进行提供)

Set this on an existing node of the cluster (that will continue to be part of the cluster and can act as potential DONOR). This node continues to retain the write-sets, thereby allowing the restarting node to rejoin using IST. (You can feed the said node as a preferred DONOR through wsrep_sst_donor while restarting the said rejoining node.)

请记住,一旦节点重新加入,请将其设回-1。这可以避免超出上述时间表的DONOR上的占用空间。在下一个清除周期中,所有旧的保留写入集也会被释放(回收空间回到原始状态).

Note:

1
2
3
4
5
To find out existing value of gcache.freeze_purge_at_seqno query wsrep_provider_options.
select @@wsrep_provider_options;
To set gcache.freeze_purge_at_seqno
set global wsrep_provider_options="gcache.freeze_purge_at_seqno = now";

Why should you use it?

  • gcache动态增长(使用现有的页面存储机制),并在用户将其设置回-1时收缩。这意味着您只在需要时使用(更多的)磁盘空间.
  • 不需要重新启动. 用户只需专注于需要维护的节点.
  • No complex math or understanding of seqno involved (simply use “now”).
  • Less prone to error, as SST is one of the major error-prone areas with the cluster.

So why wait? Give it a try! It is part of Percona XtraDB Cluster 5.7.20 onwards, and helps you get IST not SST for node rejoins

Note: If you need more information about gcache, check here and here

[译]PXC7中故障场景及恢复方法

与标准的MySQL复制不同,PXC群集就像一个逻辑实体,它负责关注每个节点的状态和一致性以及群集状态。这样可以保持更好的数据完整性,然后您可以从传统的异步复制中获益,同时允许在同一时间在多个节点上进行安全写入.

假设我们有一个PXC集群包含3个节点

情景1

情景1

节点A正常停止, 例如需要停库做一些维护, 配置变更等操作.

在这种情况下,其他节点从该节点接收”good bye”消息, 因此集群大小将会缩小(在这个例子中缩小为2), 并且某些属性如quorum caculation和auto increment(我理解为auto_increment_increment和auto_increment_offset会由于集群扩缩动态调整). 一旦我们重新开始的一个节点,它会根据它的my.cnf中wsrep_cluster_address设置加入群集. 这个过程与普通的复制有很大的不同 - 在A节点再次与集群完全同步之前,A节点不会接受供任何请求,因此仅仅是与集群建立连接是不够的,而是必须要先完成state transfer. 如果B或C节点的writeset cache (gcache.size), 仍然有恢复A节点所需的所有事务, 那么将使用IST 否则使用 SST . 因此,如本文所示,确定最佳donor很重要. 如果由于donor的gcache中缺少交易而导致IST不可用,则由donor作出回退决定,而SST自动启动。

阅读全文

如何做好MySQL的备份

如何做好MySQL的备份

物理备份还是逻辑备份?

其实物理备份和逻辑备份并没有好坏之分, 关键是要适合你的场景.

两种备份方式的备份工具有:

  • 逻辑备份工具: [mysqldump, mysqlpump, mydumper, select into out file]
  • 物理备份工具: [xtrabackup, TokuBackup, Tokudb-xtrabackup]

两种备份方式的对比如下

  • 备份速度
    物理备份比逻辑备份快吗? 不要想当然, 至少我的测试结果并不是这样
  • 恢复速度
    物理备份恢复实际就是mv操作(使用xtrabackup,在备份机做prepare), 而逻辑备份则是漫长的导入.
    同机器 1.5T库, 逻辑备份大小151G, 做恢复需要27小时左右, 而物理备份恢复则完全取决于磁盘iops, 不用测也知道要比逻辑备份快很多
  • 备份集大小
    实际测试1T的库(大部分为InnoDB表), 逻辑备份集大小为46G, 而物理备份为255G

根据以上三点, 就可以选择备份方式了吗? 我认为不能. 还有一点是数据库服务器和备份服务器之间的网络情况, 和你期望的恢复时间是多久

阅读全文

Load data奇怪的问题

Load data奇怪的问题

公司一套所谓的BI库(5.5 innodb引擎),由于太大了,考虑转用TokuDB引擎,最近在测试,直接在一个新环境安装Percona5.7.21,然后恢复备份,转换为TokuDB表, 再做从库和原5.5库同步(5.5 是statement格式)
准备同步上,然后这套环境部署一下应用,再全面的测一下.
结果发现同步报错了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
root@localhost 10:42: [test]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: xx.xx.xx.xx
Master_User: slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.010346
Read_Master_Log_Pos: 715054772
Relay_Log_File: mysql-relay.000010
Relay_Log_Pos: 376792000
Relay_Master_Log_File: mysql-bin.010288
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table: mysql.%,information_schema.%,performance_schema.%,union_log%.%,test.%
Last_Errno: 1300
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'ANONYMOUS' at master log mysql-bin.010288, end_log_pos 378517485. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Skip_Counter: 0
Exec_Master_Log_Pos: 376791811
Relay_Log_Space: 74984691638
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1300
Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'ANONYMOUS' at master log mysql-bin.010288, end_log_pos 378517485. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 12
Master_UUID:
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 180402 00:07:09
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set: a3f4b929-31a0-11e8-9714-f8bc123346cc:1-864
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)

阅读全文

5.5.x升级5.7.21步骤

5.5.x升级5.7.21步骤

环境准备

上传软件包

根据官方文档说明

Upgrading to the latest release is recommended before upgrading to the next version

当前版本5.5.46,需要上传三个包

1
2
3
mysql-5.5.59-linux-glibc2.12-x86_64.tar.gz
mysql-5.6.39-linux-glibc2.12-x86_64.tar.gz
mysql-5.7.21-linux-glibc2.12-x86_64.tar.gz

创建目录

1
mkdir /usr/local/{mysql-5.5.59,mysql-5.6.39,mysql-5.7.21}

解压

1
2
3
tar -zxvf mysql-5.5.59-linux-glibc2.12-x86_64.tar.gz -C /usr/local/mysql-5.5.59/
tar -zxvf mysql-5.6.39-linux-glibc2.12-x86_64.tar.gz -C /usr/local/mysql-5.6.39/
tar -zxvf mysql-5.7.21-linux-glibc2.12-x86_64.tar.gz -C /usr/local/mysql-5.7.21/

设置环境变量

1
2
3
4
5
6
7
8
9
10
11
12
zst_ps1()
{
Date=$(date +%F)
Time=$(date +%H:%M:%S)
PS1="\\n\[\e[1;37m[\e[m\]\[\e[1;33m\u\e[m\]\[\e[1;33m@\h\e[m\]\[\e[1;35m $Time \e[m\]\e[1;36m\w\e[m\e[1;37m]\e[m\n\\$"
}
PROMPT_COMMAND=zst_ps1
export PATH=/usr/local/mysql/bin:$PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/mysql/lib

阅读全文


Powered by Hexo and Hexo-theme-hiker

Copyright © 2013 - 2018 Fan() All Rights Reserved.

访客数 : | 访问量 :

#