前言 本文主要讲述Keepalived原理介绍和配置实践
Keepalived原理介绍和配置实践
更新历史 2020年05月08日 - 增加virual_router_id冲突解决思路和单播模式的应用 2020年03月20日 - 增加Keepalived双活实践 2019年09月03日 - 拆分LVS-Keepalived中Keepalived 2019年08月23日 - 更新LVS/NAT、LVS/DR、LVS/TUN三种模式的原理和配置实践 2018年12月03日 - 精简和更新配置步骤 2018年07月31日 - 初稿
阅读原文 - https://wsgzao.github.io/post/keepalived/
扩展阅读
LVS - http://www.linuxvirtualserver.org/zh/index.html Keepalived - http://www.keepalived.org/
ReadMe 参考文章 Keepalived - http://www.keepalived.org/doc/ The Keepalived Solution - http://www.linuxvirtualserver.org/docs/ha/keepalived.html LVS和Keepalived官方中文手册PDF - https://pan.baidu.com/s/1s0P6nUt8WF6o_N3wdE3uKg
相关术语
以下术语涉及LVS三种工作模式的原理
LB (Load Balancer 负载均衡)
HA (High Available 高可用)
Failover (失败切换)
Cluster (集群)
LVS (Linux Virtual Server Linux 虚拟服务器)
DS (Director Server),指的是前端负载均衡器节点
RS (Real Server),后端真实的工作服务器
VIP (Virtual IP),虚拟的IP地址,向外部直接面向用户请求,作为用户请求的目标的 IP 地址
DIP (Director IP),主要用于和内部主机通讯的 IP 地址
RIP (Real Server IP),后端服务器的 IP 地址
CIP (Client IP),访问客户端的 IP 地址
负载均衡(LB)
负载均衡实现方法有两种:硬件实现和软件实现
硬件比较常见的有:
F5 Big-IP
Citrix Netscaler
软件比较常见的有:
LVS(Linux Virtual Server)
HAProxy
Nginx
LVS特点是:
首先它是基于4层的网络协议的,抗负载能力强,对于服务器的硬件要求除了网卡外,其他没有太多要求;
配置性比较低,这是一个缺点也是一个优点,因为没有可太多配置的东西,大大减少了人为出错的几率;
应用范围比较广,不仅仅对web服务做负载均衡,还可以对其他应用(mysql)做负载均衡;
LVS架构中存在一个虚拟IP的概念,需要向IDC多申请一个IP来做虚拟IP。
Nginx负载均衡器的特点是:
工作在网络的7层之上,可以针对http应用做一些分流的策略,比如针对域名、目录结构;
Nginx安装和配置比较简单,测试起来比较方便;
也可以承担高的负载压力且稳定,一般能支撑超过上万次的并发;
Nginx可以通过端口检测到服务器内部的故障,比如根据服务器处理网页返回的状态码、超时等等,并且会把返回错误的请求重新提交到另一个节点,不过其中缺点就是不支持url来检测;
Nginx对请求的异步处理可以帮助节点服务器减轻负载;
Nginx能支持http和Email,这样就在适用范围上面小很多;
默认有三种调度算法: 轮询、weight以及ip_hash(可以解决会话保持的问题),还可以支持第三方的fair和url_hash等调度算法;
HAProxy的特点是:
HAProxy是工作在网络7层之上;
支持Session的保持,Cookie的引导等;
支持url检测后端的服务器出问题的检测会有很好的帮助;
支持的负载均衡算法:动态加权轮循(Dynamic Round Robin),加权源地址哈希(Weighted Source Hash),加权URL哈希和加权参数哈希(Weighted Parameter Hash);
单纯从效率上来讲HAProxy更会比Nginx有更出色的负载均衡速度;
HAProxy可以对Mysql进行负载均衡,对后端的DB节点进行检测和负载均衡。
keepalived简介 Keepalived 是运行在lvs之上,是一个用于做双机热备(HA)的软件,它的主要功能是实现真实机的故障隔离及负载均衡器间的失败切换,提高系统的可用性。
运行原理 keepalived通过选举(看服务器设置的权重)挑选出一台热备服务器做MASTER机器,MASTER机器会被分配到一个指定的虚拟ip,外部程序可通过该ip访问这台服务器,如果这台服务器出现故障(断网,重启,或者本机器上的keepalived crash等),keepalived会从其他的备份机器上重选(还是看服务器设置的权重)一台机器做MASTER并分配同样的虚拟IP,充当前一台MASTER的角色。
选举策略 选举策略是根据VRRP协议,完全按照权重大小,权重最大(0~255)的是MASTER机器,下面几种情况会触发选举
keepalived启动的时候
master服务器出现故障(断网,重启,或者本机器上的keepalived crash等,而本机器上其他应用程序crash不算)
有新的备份服务器加入且权重最大
keepalived的配置文件说明 Keepalived 是运行在lvs之上,它的主要功能是实现RealServer(真实服务器)的故障隔离及Director(负载均衡器)间的FailOver(失败切换).
keepalived 是lvs的扩展项目,因此它们之间具备良好的兼容性
对RealServer的健康检查,实现对失效机器/服务的故障隔离
负载均衡器之间的失败切换 failover
全局定义 全局配置又包括两个子配置
全局定义(global definition)
静态路由配置(static ipaddress/routes)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 global_defs { notification_email { acassen@firewall.loc failover@firewall.loc sysadmin@firewall.loc } notification_email_from Alexandre.Cassen@firewall.loc smtp_server 192.168.200.1 smtp_connect_timeout 30 router_id LVS_DEVEL } notification_email: 表示keepalived在发生诸如切换操作时需要发送email通知以及email发送给哪些邮件地址邮件地址可以多个每行一个 notification_email_from admin@example.com: 表示发送通知邮件时邮件源地址是谁 smtp_server 127.0.0.1: 表示发送email时使用的smtp服务器地址这里可以用本地的sendmail来实现 smtp_connect_timeout 30: 连接smtp连接超时时间 router_id node1: 机器标识,通常配置主机名 static_ipaddress { 192.168.1.1/24 brd + dev eth0 scope global 192.168.1.2/24 brd + dev eth1 scope global } static_routes { src $SRC_IP to $DST_IP dev $SRC_DEVICE src $SRC_IP to $DST_IP via $GW dev $SRC_DEVICE } 这里实际上和系统里面命令配置IP地址和路由一样例如 192.168.1.1/24 brd + dev eth0 scope global 相当于: ip addr add 192.168.1.1/24 brd + dev eth0 scope global 就是给eth0配置IP地址路由同理,一般这个区域不需要配置 这里实际上就是给服务器配置真实的IP地址和路由的在复杂的环境下可能需要配置一般不会用这个来配置我们可以直接用vi /etc/sysconfig/network-script/ifcfg-eth1来配置切记这里可不是VIP不要搞混淆了切记切记
VRRPD配置 包括三个类:
VRRP同步组(synchroization group)
VRRP实例(VRRP Instance)
VRRP脚本
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 vrrp_sync_group VG_1 { //注意vrrp_sync_group 后面可自定义名称如lvs_httpd ,httpd group { http mysql } notify_master /path/to/to_master.sh notify_backup /path_to/to_backup.sh notify_fault "/path/fault.sh VG_1" notify /path/to/notify.sh smtp_alert } 其中http和mysql是实例名和下面的实例名一致 notify_master /path/to/to_master.sh //表示当切换到master状态时要执行的脚本 notify_backup /path_to/to_backup.sh //表示当切换到backup状态时要执行的脚本 notify_fault "/path/fault.sh VG_1" // keepalived出现故障时执行的脚本 notify /path/to/notify.sh smtp_alert //表示切换时给global defs中定义的邮件地址发送邮件通知 vrrp_instance http { //注意vrrp_instance 后面可自定义名称如lvs_httpd ,httpd state MASTER interface eth0 dont_track_primary track_interface { eth0 eth1 } mcast_src_ip <IPADDR> garp_master_delay 10 virtual_router_id 51 priority 100 advert_int 1 authentication { auth_type PASS autp_pass 1234 } virtual_ipaddress { 192.168.200.17/24 dev eth1 192.168.200.18/24 dev eth2 label eth2:1 } virtual_routes { src 192.168.100.1 to 192.168.109.0/24 via 192.168.200.254 dev eth1 192.168.110.0/24 via 192.168.200.254 dev eth1 192.168.111.0/24 dev eth2 192.168.112.0/24 via 192.168.100.254 } nopreempt preemtp_delay 300 debug }
state: state指定instance(Initial)的初始状态就是说在配置好后这台 服务器的初始状态就是这里指定的但这里指定的不算还是得要通过竞选通过优先级来确定里如果这里设置为master但如若他的优先级不及另外一台 那么这台在发送通告时会发送自己的优先级另外一台发现优先级不如自己的高那么他会就回抢占为master
interface: 实例绑定的网卡因为在配置虚拟VIP的时候必须是在已有的网卡上添加的
dont track primary: 忽略VRRP的interface错误
track interface: 跟踪接口设置额外的监控里面任意一块网卡出现问题都会进入故障(FAULT)状态例如用nginx做均衡器的时候内网必须正常工作如果内网出问题了这个均衡器也就无法运作了所以必须对内外网同时做健康检查
mcast src ip: 发送多播数据包时的源IP地址这里注意了这里实际上就是在那个地址上发送VRRP通告这个非常重要一定要选择稳定的网卡端口来发送这里相当于heartbeat的心跳端口如果没有设置那么就用默认的绑定的网卡的IP也就是interface指定的IP地址
garp master delay: 在切换到master状态后延迟进行免费的ARP(gratuitous ARP)请求,默认5s
virtual router id: 这里设置VRID这里非常重要相同的VRID为一个组他将决定多播的MAC地址
priority 100: 设置本节点的优先级优先级高的为master
advert int: 设置MASTER与BACKUP负载均衡之间同步即主备间通告时间检查的时间间隔,单位为秒,默认1s
virtual ipaddress: 这里设置的就是VIP也就是虚拟IP地址他随着state的变化而增加删除当state为master的时候就添加当state为backup的时候删除这里主要是有优先级来决定的和state设置的值没有多大关系这里可以设置多个IP地址
virtual routes: 原理和virtual ipaddress一样只不过这里是增加和删除路由
lvs sync daemon interface: lvs syncd绑定的网卡,类似HA中的心跳检测绑定的网卡
authentication: 这里设置认证
auth type: 认证方式可以是PASS或AH两种认证方式
auth pass: 认证密码
nopreempt: 设置不抢占master,这里只能设置在state为backup的节点上而且这个节点的优先级必须别另外的高,比如master因为异常将调度圈交给了备份serve,master serve检修后没问题,如果不设置nopreempt就会将调度权重新夺回来,这样就容易造成业务中断问题
preempt delay: 抢占延迟多少秒,即延迟多少秒后竞选master
debug:debug级别
notify master:和sync group这里设置的含义一样可以单独设置例如不同的实例通知不同的管理人员http实例发给网站管理员mysql的就发邮件给DBA
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 vrrp_script check_running { script "/usr/local/bin/check_running" interval 10 weight 10 } vrrp_instance http { state BACKUP smtp_alert interface eth0 virtual_router_id 101 priority 90 advert_int 3 authentication { auth_type PASS auth_pass whatever } virtual_ipaddress { 1.1.1.1 } track_script { check_running } } vrrp_script check_running { script "/usr/local/bin/check_running" interval 10 weight 10 } track_script { check_running }
注意: VRRP脚本(vrrp_script)和VRRP实例(vrrp_instance)属于同一个级别 keepalived会定时执行脚本并对脚本执行的结果进行分析,动态调整vrrp_instance的优先级。一般脚本检测返回的值为0,说明脚本检测成功,如果为非0数值,则说明检测失败 如果脚本执行结果为0,并且weight配置的值大于0,则优先级相应的增加, 如果weight为非0,则优先级不变 如果脚本执行结果非0,并且weight配置的值小于0,则优先级相应的减少, 如果weight为0,则优先级不变 其他情况,维持原本配置的优先级,即配置文件中priority对应的值。 这里需要注意的是: 1) 优先级不会不断的提高或者降低 2) 可以编写多个检测脚本并为每个检测脚本设置不同的weight 3) 不管提高优先级还是降低优先级,最终优先级的范围是在[1,254],不会出现优先级小于等于0或者优先级大于等于255的情况 这样可以做到利用脚本检测业务进程的状态,并动态调整优先级从而实现主备切换。
virtual_server 虚拟主机配置 关于keeplived的虚拟主机配置有三种如下所示 virtual server IP port virtual server fwmark int virtual server group string
以常用的第一种为例 virtual_server 192.168.1.2 80 含义:设置一个virtual server: VIP:Vport
delay_loop 3 含义:设置service polling的delay时间即服务轮询的时间间隔
lb_algo rr|wrr|lc|wlc|lblc|sh|dh 含义:设置LVS调度算法
lb_kind NAT|DR|TUN 含义:设置LVS集群模式
persistence_timeout 120 含义:设置会话保持时间秒为单位即以用户在120秒内被分配到同一个后端realserver,超过此时间就重新分配
persistence_granularity 含义:设置LVS会话保持粒度ipvsadm中的-M参数默认是0xffffffff即每个客户端都做会话保持
protocol TCP 含义:设置健康检查用的是TCP还是UDP
ha_suspend 含义:suspendhealthchecker’s activity
virtualhost 含义:HTTP_GET做健康检查时检查的web服务器的虚拟主机即host头
sorry_server 含义:设置backupserver就是当所有后端realserver节点都不可用时就用这里设置的也就是临时把所有的请求都发送到这里
real_server 含义:设置后端真实节点主机的权重等设置主要后端有几台这里就要设置几个
weight 1 含义:设置给每台的权重0表示失效(不知给他转发请求知道他恢复正常)默认是1
inhibit_on_failure 含义:表示在节点失败后把他权重设置成0而不是冲IPVS中删除
notify_up | 含义:设置检查服务器正常(UP)后要执行的脚本 notify_down | 含义:设置检查服务器失败(down)后要执行的脚本
注:keepalived检查机制说明 keepalived健康检查方式有:HTTP_GET|SSL_GET|TCP_CHECK|SMTP_CHECK|MISC_CHECK几种如下所示
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 HTTP_GET|SSL_GET { url { path / digest <STRING> status_code 200 } connect_port 80 bindto <IPADD> connect_timeout 3 nb_get_retry 3 delay_before_retry 2 } TCP_CHECK { connect_port 80 bindto <IPADD> connect_timeout 3 nb_get_retry 3 delay_before_retry 2 } SMTP_CHECK { host { connect_ip <IP ADDRESS> connect_port <PORT> 14 KEEPALIVED bindto <IP ADDRESS> } connect_timeout <INTEGER> retry <INTEGER> delay_before_retry <INTEGER> helo_name <STRING>|<QUOTED-STRING> } MISC_CHECK { misc_path <STRING>|<QUOTED-STRING> misc_timeout <INT> misc_dynamic }
以上就是keepalived的配置项说明虽然配置项很多但很多时候很多配置项保持默认即可,以下是默认配置文件,方便大家做个对比参考
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 [root@sg-gop-10-65-32-140 wangao] ! Configuration File for keepalived global_defs { notification_email { acassen@firewall.loc failover@firewall.loc sysadmin@firewall.loc } notification_email_from Alexandre.Cassen@firewall.loc smtp_server 192.168.200.1 smtp_connect_timeout 30 router_id LVS_DEVEL vrrp_skip_check_adv_addr vrrp_strict vrrp_garp_interval 0 vrrp_gna_interval 0 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 51 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.200.16 192.168.200.17 192.168.200.18 } } virtual_server 192.168.200.100 443 { delay_loop 6 lb_algo rr lb_kind NAT persistence_timeout 50 protocol TCP real_server 192.168.201.100 443 { weight 1 SSL_GET { url { path / digest ff20ad2481f97b1754ef3e12ecd3a9cc } url { path /mrtg/ digest 9b3a0c85a887a256d6939da88aabd8cd } connect_timeout 3 nb_get_retry 3 delay_before_retry 3 } } } virtual_server 10.10.10.2 1358 { delay_loop 6 lb_algo rr lb_kind NAT persistence_timeout 50 protocol TCP sorry_server 192.168.200.200 1358 real_server 192.168.200.2 1358 { weight 1 HTTP_GET { url { path /testurl/test.jsp digest 640205b7b0fc66c1ea91c463fac6334d } url { path /testurl2/test.jsp digest 640205b7b0fc66c1ea91c463fac6334d } url { path /testurl3/test.jsp digest 640205b7b0fc66c1ea91c463fac6334d } connect_timeout 3 nb_get_retry 3 delay_before_retry 3 } } real_server 192.168.200.3 1358 { weight 1 HTTP_GET { url { path /testurl/test.jsp digest 640205b7b0fc66c1ea91c463fac6334c } url { path /testurl2/test.jsp digest 640205b7b0fc66c1ea91c463fac6334c } connect_timeout 3 nb_get_retry 3 delay_before_retry 3 } } } virtual_server 10.10.10.3 1358 { delay_loop 3 lb_algo rr lb_kind NAT persistence_timeout 50 protocol TCP real_server 192.168.200.4 1358 { weight 1 HTTP_GET { url { path /testurl/test.jsp digest 640205b7b0fc66c1ea91c463fac6334d } url { path /testurl2/test.jsp digest 640205b7b0fc66c1ea91c463fac6334d } url { path /testurl3/test.jsp digest 640205b7b0fc66c1ea91c463fac6334d } connect_timeout 3 nb_get_retry 3 delay_before_retry 3 } } real_server 192.168.200.5 1358 { weight 1 HTTP_GET { url { path /testurl/test.jsp digest 640205b7b0fc66c1ea91c463fac6334d } url { path /testurl2/test.jsp digest 640205b7b0fc66c1ea91c463fac6334d } url { path /testurl3/test.jsp digest 640205b7b0fc66c1ea91c463fac6334d } connect_timeout 3 nb_get_retry 3 delay_before_retry 3 } } }
最简单的Keepalived HA配置实例 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 yum install keepalived -y -A INPUT -p vrrp -j ACCEPT -A INPUT -p igmp -j ACCEPT -A INPUT -d 224.0.0.18 -j ACCEPT vi /etc/keepalived/keepalived.conf vrrp_sync_group VI_GOP_NC1_HA { group { VI_GOP_NC1_HA_PRI } } vrrp_instance VI_GOP_NC1_HA_PRI { state BACKUP interface bond0 virtual_router_id 139 priority 100 advert_int 1 nopreempt authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 10.65.33.139/23 dev bond0 } } service keepalived start
Keepalived双活实践
最简单的keepalived双活,只需要修改state和priority
优点:配置文件简单
缺点:
当master恢复后会自动回切,影响业务流量
两个节点配置不完全一致,对自动化运维管理不友好
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 # node1 vrrp_instance VI_1 { state MASTER interface bond0 virtual_router_id 32 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 10.71.17.32/23 dev bond0 } } vrrp_instance VI_2 { state BACKUP interface bond0 virtual_router_id 33 priority 98 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 10.71.17.33/23 dev bond0 } } # node2 vrrp_instance VI_1 { state BACKUP interface bond0 virtual_router_id 32 priority 98 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 10.71.17.32/23 dev bond0 } } vrrp_instance VI_2 { state MASTER interface bond0 virtual_router_id 33 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 10.71.17.33/23 dev bond0 } }
合理的keepalived双活
优点:
添加nopreempt可以防止自动回切
添加track_script可以人为控制切换
节点之间配置完全一致,便于自动化运维管理
缺点:配置文件较为复杂
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 # node1 vrrp_script maint-10.71.17.32 { script "/bin/bash -c '[[ -e /etc/keepalived/10.71.17.32 ]]' && exit 1 || exit 0" interval 2 fall 2 rise 2 } vrrp_script maint-10.71.17.33 { script "/bin/bash -c '[[ -e /etc/keepalived/10.71.17.33 ]]' && exit 1 || exit 0" interval 2 fall 2 rise 2 } vrrp_instance VI_1 { state BACKUP interface bond0 virtual_router_id 32 priority 100 advert_int 1 nopreempt authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 10.71.17.32/23 dev bond0 } track_script { maint-10.71.17.32 } } vrrp_instance VI_2 { state BACKUP interface bond0 virtual_router_id 33 priority 100 advert_int 1 nopreempt authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 10.71.17.33/23 dev bond0 } track_script { maint-10.71.17.33 } } # node2 vrrp_script maint-10.71.17.32 { script "/bin/bash -c '[[ -e /etc/keepalived/10.71.17.32 ]]' && exit 1 || exit 0" interval 2 fall 2 rise 2 } vrrp_script maint-10.71.17.33 { script "/bin/bash -c '[[ -e /etc/keepalived/10.71.17.33 ]]' && exit 1 || exit 0" interval 2 fall 2 rise 2 } vrrp_instance VI_1 { state BACKUP interface bond0 virtual_router_id 32 priority 100 advert_int 1 nopreempt authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 10.71.17.32/23 dev bond0 } track_script { maint-10.71.17.32 } } vrrp_instance VI_2 { state BACKUP interface bond0 virtual_router_id 33 priority 100 advert_int 1 nopreempt authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 10.71.17.33/23 dev bond0 } track_script { maint-10.71.17.33 } }
如果需要配合自定义脚本监控使用,可以参考Redis 主从同步配置实践
简单的Keepalived邮件告警实例
编写sendmail.py邮件发送脚本
在keepalived.conf中配置notify_backup
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 import sysimport socketimport smtplibEMAIL_CONFIG = { 'EMAIL_HOST' : 'xxx' , 'EMAIL_HOST_USER' : 'xxx' , 'EMAIL_RECEIVER' : 'xxx' } def _get_private_ip (): sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) try : sock.connect(('10.255.255.255' , 1 )) return sock.getsockname()[0 ] except : return '127.0.0.1' finally : sock.close() def send_email (): ip = _get_private_ip() hostname = socket.gethostname() message = 'Subject: Keepalived Failover Alert %s \n\nHOSTNAME %s on LANIP %s HA status has changed to %s' % ( sys.argv[1 ], hostname, ip, sys.argv[1 ]) server = smtplib.SMTP(EMAIL_CONFIG["EMAIL_HOST" ]) server.sendmail(EMAIL_CONFIG['EMAIL_HOST_USER' ], EMAIL_CONFIG['EMAIL_RECEIVER' ], message) server.quit() send_email()
如果有设置vrrp_sync_group可以添加在这里通过群组控制,如果没有就跟在vrrp_instance独立设置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 # notify scripts and alerts are optional # # filenames of scripts to run on transitions can be unquoted (if # just filename) or quoted (if it has parameters) # The username and groupname specify the user and group # under which the scripts should be run. If username is # specified, the group defaults to the group of the user. # If username is not specified, they default to the # global script_user and script_group to MASTER transition notify_master /path/to_master.sh [username [groupname]] # to BACKUP transition notify_backup /path/to_backup.sh [username [groupname]] # FAULT transition notify_fault "/path/fault.sh VG_1" [username [groupname]] vrrp_sync_group NC-CLOUD-LOADTEST { group { NC-CLOUD-LOADTEST-PUB NC-CLOUD-LOADTEST-PRI } notify_master "/bin/python /etc/keepalived/sendmail.py master" notify_backup "/bin/python /etc/keepalived/sendmail.py backup" }
Keepalived Notification and Tracking Scripts Keepalived官方的文档并没有给出实践案例,我对上面的代码改进之后的效果如下
实现双活,支持不中断LVS人工干预任意节点运行位置
实现status状态无变化时无告警邮件
Keepalived is a Linux implementation of the VRRP (Virtual Router Redundancy Protocol) protocol to make IPs highly available. Keepalived check and notify scripts can be used to check anything you want to ensure the Master is on the right node and take action if a state change.
notify scripts and alerts are optional
About Keepalived Notification and Tracking Scripts
Check script has two reutrn value:
0 for everything is fine
1 or other than 0 means something went wrong.
For example:
1 2 3 4 5 6 vrrp_script maint-xxx { script "/bin/bash -c '[[ -e /etc/keepalived/xxx ]]' && exit 1 || exit 0" interval 2 fall 2 rise 2 }
This script defines file check to check whether the file xxx is exist. The check interval is 2 seconds, check fail and succeed twice for KO and OK.
The check script is used in a vrrp_instance
as follows
The track_script
returns other code than 0 two times, the VRRP instance will change the state to FAULT
, or the instance will change the state to running
if return code 0 two times.
keepalived.conf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 vrrp_sync_group VI_GROUP_xxx { group { VI_PRI_xxx VI_PUB_xxx } notify /etc/keepalived/notify.sh } vrrp_sync_group VI_GROUP_xxx { group { VI_PRI_xxx VI_PUB_xxx } notify /etc/keepalived/notify.sh } vrrp_script maint-xxx { script "/bin/bash -c '[[ -e /etc/keepalived/xxx ]]' && exit 1 || exit 0" interval 10 fall 6 rise 2 } vrrp_script maint-xxx { script "/bin/bash -c '[[ -e /etc/keepalived/xxx ]]' && exit 1 || exit 0" interval 10 fall 6 rise 2 } vrrp_script maint-xxx { script "/bin/bash -c '[[ -e /etc/keepalived/xxx ]]' && exit 1 || exit 0" interval 10 fall 6 rise 2 } vrrp_script maint-xxx { script "/bin/bash -c '[[ -e /etc/keepalived/xxx ]]' && exit 1 || exit 0" interval 10 fall 6 rise 2 } vrrp_instance VI_PRI_xxx { state BACKUP interface bond0 virtual_router_id 138 priority 100 advert_int 1 nopreempt authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { xxx/23 dev bond0 } track_script { maint-xxx } } vrrp_instance VI_PRI_xxx { state BACKUP interface bond0 virtual_router_id 139 priority 100 advert_int 1 nopreempt authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { xxx/23 dev bond0 } track_script { maint-xxx } } vrrp_instance VI_PUB_xxx { state BACKUP interface bond1 virtual_router_id 101 priority 100 advert_int 1 nopreempt authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { xxx/26 dev bond1 } track_script { maint-xxx } } vrrp_instance VI_PUB_xxx { state BACKUP interface bond1 virtual_router_id 102 priority 100 advert_int 1 nopreempt authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { xxx/26 dev bond1 } track_script { maint-xxx } }
notify.sh
Keepalived tasks some action depending on the VRRP state.
1 2 3 4 5 6 7 vrrp_sync_group VI_GROUP_xxx { group { VI_PRI_xxx VI_PUB_xxx } notify /etc/keepalived/notify.sh }
The script is called after any state change with the following parameters:
$1 = “GROUP” or “INSTANCE” $2 = name of group or instance $3 = target state of transition (“MASTER”, “BACKUP”, “FAULT”)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 #!/bin/bash TYPE=$1 NAME=$2 STATE=$3 FILE="/etc/keepalived/${NAME} " if [ ! -f "${FILE} " ]; then touch "${FILE} " fi ORI_STATE=`cat ${FILE} ` if [ ${STATE} == ${ORI_STATE} ];then exit 0 else case $STATE in "MASTER" ) /bin/python /etc/keepalived/sendmail.py ${STATE} ${TYPE} ${NAME} echo "${STATE} " > "${FILE} " exit 0 ;; "BACKUP" ) /bin/python /etc/keepalived/sendmail.py ${STATE} ${TYPE} ${NAME} echo "${STATE} " > "${FILE} " exit 0 ;; "FAULT" ) /bin/python /etc/keepalived/sendmail.py ${STATE} ${TYPE} ${NAME} echo "${STATE} " > "${FILE} " exit 0 ;; *) echo "unknown state" exit 1 ;; esac fi
sendmail.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 import sysimport socketimport smtplibEMAIL_CONFIG = { 'EMAIL_HOST' : 'xxx' , 'EMAIL_HOST_USER' : 'xxx' , 'EMAIL_RECEIVER' : 'xxx' } def _get_private_ip (): sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) try : sock.connect(('10.255.255.255' , 1 )) return sock.getsockname()[0 ] except : return '127.0.0.1' finally : sock.close() def send_email (): ip = _get_private_ip() hostname = socket.gethostname() message = 'Subject: Keepalived Failover Alert %s \n\nHOSTNAME %s on LANIP %s %s %s status has changed to %s' % ( sys.argv[1 ], hostname, ip, sys.argv[2 ], sys.argv[3 ], sys.argv[1 ]) server = smtplib.SMTP(EMAIL_CONFIG["EMAIL_HOST" ]) server.sendmail(EMAIL_CONFIG['EMAIL_HOST_USER' ], EMAIL_CONFIG['EMAIL_RECEIVER' ], message) server.quit() send_email()
Keepalived常见问题
virual_router_id冲突
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 # 检查keepalived错误日志会发现 tailf /var/log/messages May 7 23:25:18 xxx Keepalived_vrrp[90851]: bogus VRRP packet received on bond1 !!! May 7 23:25:18 xxx Keepalived_vrrp[90851]: VRRP_Instance(VI_PUB_xxx) ignoring received advertisment... May 7 23:25:19 xxx Keepalived_vrrp[90851]: (VI_PUB_xxx): ip address associated with VRID 101 not present in MASTER advert : xxx # 通过检查配置文件的方法效率太低 grep 'virtual_router_id' /etc/keepalived/keepalived.conf virtual_router_id 148 virtual_router_id 149 virtual_router_id 101 virtual_router_id 104 # 如果是vrrp广播可以通过tcpdump抓包分析 tcpdump -i bond1 -nn ' vrrp' tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on bond1, link-type EN10MB (Ethernet), capture size 262144 bytes 00:16:44.919824 IP xxx > 224.0.0.18: VRRPv2, Advertisement, vrid 105, prio 100, authtype simple, intvl 1s, length 20 00:16:44.995030 IP xxx > 224.0.0.18: VRRPv2, Advertisement, vrid 101, prio 100, authtype simple, intvl 1s, length 20 00:16:44.995046 IP xxx > 224.0.0.18: VRRPv2, Advertisement, vrid 104, prio 100, authtype simple, intvl 1s, length 20 00:16:44.996107 IP xxx > 224.0.0.18: VRRPv2, Advertisement, vrid 123, prio 100, authtype simple, intvl 1s, length 20
keepalived 单播模式
keepalived在组播模式下所有的信息都会向224.0.0.18的组播地址发送,产生众多的无用信息,并且会产生干扰和冲突,所以需要将其组播的模式改为单播。这是一种安全的方法,避免局域网内有大量的keepalived造成虚拟路由id的冲突。
单薄模式需要关闭vrrp_strict,严格遵守vrrp协议这个选项
单薄需要在VIP实例配置段加入单播的源地址和目标地址
1 2 3 4 5 6 7 8 9 global_defs { #vrrp_strict #将严格遵守vrrp协议这一项关闭,否则会因为不是组播而无法启动keepalived } # 主备节点地址注意互换 unicast_src_ip 172.20.27.10 #配置单播的源地址 unicast_peer { 172.20.27.11 #配置单播的目标地址 }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 # See description of global vrrp_strict # If vrrp_strict is not specified, it takes the value of vrrp_strict # If strict_mode without a parameter is specified, it defaults to on strict_mode [on|off|true|false|yes|no] # default IP for binding vrrpd is the primary IP # on interface. If you want to hide the location of vrrpd, # use this IP as src_addr for multicast or unicast vrrp # packets. (since it's multicast, vrrpd will get the reply # packet no matter what src_addr is used). # optional mcast_src_ip <IPADDR> unicast_src_ip <IPADDR> # Do not send VRRP adverts over a VRRP multicast group. # Instead it sends adverts to the following list of # ip addresses using unicast. It can be cool to use # the VRRP FSM and features in a networking # environment where multicast is not supported! # IP addresses specified can be IPv4 as well as IPv6. unicast_peer { <IPADDR> ... }
keepalived 单播模式
LVS和Keepalived系列 LVS和Keepalived的原理介绍和配置实践 LVS原理介绍和配置实践 Keepalived原理介绍和配置实践 LVS-NAT原理介绍和配置实践 LVS-DR原理介绍和配置实践 LVS-TUN原理介绍和配置实践
参考文档 Keepalived Configuration Manual Page
Keepalived User Guide
keepalived实战
实现高可用集群的神器 详解 Keepalived
LVS 小宇宙爆发! 当 Keepalived 遇上 LVS,实现集群高可用