Zookeeper3.6.0集群部署文档
一点一点完善…
下载安装包
https://zookeeper.apache.org/releases.html
1 | cd /tmp && \ |
配置环境变量
1 | vi ~/.bashrc |
修改配置文件
zoo.cfg
1 | cat zoo.cfg |
dynamicConfigFile
vim /usr/local/apache-zookeeper-3.6.0-bin/conf/zoo_replicated1.cfg.dynamic
三个节点一样
1 | #cat zoo_replicated1.cfg.dynamic |
注意不能用0.0.0.0, 否则有bug
用于客户端连接的端口
clientPort: 2181
用于节点间通信的TCP端口
peerPort: 2888
用于首领选举的TCP端口
leaderPort: 3888participant代表参与者
myid
1 | #master |
配置zk日志的滚动输入
看bin/zkEnv.sh 里面
默认zk日志输出到一个文件,且不会自动清理,所以,一段时间后zk日志会非常大!
这里配置zk日志滚动输出,且每个文件10M限制,最多保留10个文件.
zookeeper-env.sh
./conf
目录下新建zookeeper-env.sh
文件,修改到sudo chmod 755 zookeeper-env.sh
权限1
2
3
4
5
6
7#cat conf/zookeeper-env.sh
#!/usr/bin/env bash
#tip:custom configurationfile,do not amend the zkEnv.sh file
#chang the log dir and output of rolling file
ZOO_LOG_DIR="/usr/local/zookeeper/logs"
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"log4j.properties 修改日志的输入形式
1
2
3
4
5
6
7
8
9
10
11zookeeper.root.logger=INFO, ROLLINGFILE
#zookeeper.root.logger=INFO, CONSOLE
zookeeper.console.threshold=INFO
zookeeper.log.dir=/usr/local/zookeeper/logs
zookeeper.log.file=zookeeper.log
zookeeper.log.threshold=INFO
zookeeper.log.maxfilesize=256MB --要改就改这个
zookeeper.log.maxbackupindex=20 --要改就改这个mkdir /usr/local/zookeeper/logs
#!/usr/bin/env bash #config the jvm parameter in a reasonable #note that the shell is source in so that do not need to use export #set java classpath #CLASSPATH="" #set jvm start parameter , also can set JVMFLAGS variable SERVER_JVMFLAGS="-Xms1024m -Xmx2048m $JVMFLAGS"1
2
3
4
5
6
7
8
9
### 配置运行zk的jvm
> 看bin/zkEnv.sh 里面
`./conf`目录下新建`java.env`文件,修改到`sudo chmod 755 java.env`权限,主要用于`GC log`,`RAM`等的配置.# zkServer.sh start ZooKeeper JMX enabled by default Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg Starting zookeeper ... STARTED1
2
3
4
5
## 启动zookeeper服务(所有节点)#telnet 127.0.0.1 2181 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. stat stat is not executed because it is not in the whitelist. Connection closed by foreign host.1
2
3
4
5
6
7
8
9
10
## 遇到问题
### 问题1
使用stat验证zookeeper服务时报错#cat zoo_ch_9000.cfg.dynamic server.1=0.0.0.0:2888:3888:participant;0.0.0.0:2181 server.2=172.16.24.13:2888:3888:participant;0.0.0.0:2181 server.3=172.16.24.109:2888:3888:participant;0.0.0.0:21811
2
3
4
5
6
7
8
9
10
这里出问题了. 3.5.3以后新增参数`4lw.commands.whitelist`
https://zookeeper.apache.org/doc/r3.6.0/zookeeperAdmin.html
### 问题2
之前我是这样配置的dynamicConfigFile的
三个节点, 自己都是`0.0.0.0`#cat zoo_ch_9000.cfg.dynamic server.1=172.16.24.2:2888:3888:participant;0.0.0.0:2181 server.2=0.0.0.0:2888:3888:participant;0.0.0.0:2181 server.3=172.16.24.109:2888:3888:participant;0.0.0.0:21811
#cat zoo_ch_9000.cfg.dynamic server.1=172.16.24.2:2888:3888:participant;0.0.0.0:2181 server.2=172.16.24.13:2888:3888:participant;0.0.0.0:2181 server.3=0.0.0.0:2888:3888:participant;0.0.0.0:21811
2020-04-15 16:03:24,420 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager@462] - Have smaller server identifier, so dropping the connection: (3, 1) 2020-04-15 16:03:24,622 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled):QuorumCnxManager@462] - Have smaller server identifier, so dropping the connection: (2, 1) 2020-04-15 16:03:24,623 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled):QuorumCnxManager@462] - Have smaller server identifier, so dropping the connection: (3, 1) 2020-04-15 16:03:24,623 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled):FastLeaderElection@966] - Notification time out: 400 2020-04-15 16:03:25,024 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled):QuorumCnxManager@462] - Have smaller server identifier, so dropping the connection: (2, 1) 2020-04-15 16:03:25,025 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled):QuorumCnxManager@462] - Have smaller server identifier, so dropping the connection: (3, 1) 2020-04-15 16:03:25,025 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled):FastLeaderElection@966] - Notification time out: 8001
2
3
这样装完以后能用, 但是myid=1挂掉重启后一直无法加入
貌似是bug
https://issues.apache.org/jira/browse/ZOOKEEPER-2938
用3.4.14的配置启动3.6.0 仍然有此问题, 说明可能不是配置问题
反复测过几次就是0.0.0.0的问题, 实际上我在来云账户之前从没有用过0.0.0.0, 之前马蜂窝的服务器也是双网卡, 我看过运维的kafka和大数据的kafka都没有使用过0.0.0.0这种方式, 来到这边才看到这种用法, 本着”可能有坑、与线上统一”的原则继承了这样的配置, 实际对这种配置我还是不太理解, 虽然百度了一下说是ECS或Docker不这样配有问题