Want IST Not SST for Node Rejoins? We Have a Solution!

Krunal Bauskar | February 13, 2018 | Posted In: High-availability, MySQL, Percona XtraDB Cluster

Introducing “gcache.freeze_purge_at_seqno”

Galera集群世界中的所有事务都被分配了唯一的全局序列号（seqno）.跟踪事情发生使用此seqno（如wsrep_last_applied，wsrep_last_committed，wsrep_replicated，wsrep_local_cached_downto等^1）.wsrep_local_cached_downto表示gcache已被清除的序列号。假设wsrep_local_cached_downto = N，那么gcache具有来自[N，wsrep_replicated]的数据, 并清除了[1，N)数据。

gcache.freeze_purge_at_seqno takes three values:

1. -1(默认值): no freeze, the purge operates as normal.
2. **x (should be valid seqno in gcache):** freeze purge of write-sets >= x. The best way to select x is to use the wsrep_last_applied value as an indicator from the node that you plan to shut down. (wsrep_applied * 0.09. Retain this extra 10% to trick the [safety gap heuristic algorithm of IST](https://www.percona.com/blog/2017/11/15/understanding-ist-donor-selected/).)
3. **now:** freeze purge of write-sets &gt;= smallest seqno currently in gcache. Instant freeze of gcache-purge. (If tracing x (above) is difficult, simply use “now” and you are good).


Set this on an existing node of the cluster (that will continue to be part of the cluster and can act as potential DONOR). This node continues to retain the write-sets, thereby allowing the restarting node to rejoin using IST. (You can feed the said node as a preferred DONOR through wsrep_sst_donor while restarting the said rejoining node.)

Note:

Why should you use it?

• gcache动态增长（使用现有的页面存储机制），并在用户将其设置回-1时收缩。这意味着您只在需要时使用(更多的)磁盘空间.
• 不需要重新启动. 用户只需专注于需要维护的节点.
• No complex math or understanding of seqno involved (simply use “now”).
• Less prone to error, as SST is one of the major error-prone areas with the cluster.

So why wait? Give it a try! It is part of Percona XtraDB Cluster 5.7.20 onwards, and helps you get IST not SST for node rejoins