前言

因为工作的缘故接触并积极推动 Ansible 在企业级生产环境的落地,独立承担并实现了《基于 ansible 的主机自动化配置管理》项目,此前也先后接触过 Puppet 和 SaltStack,本文不会讨论开源或者自主研发方案的优劣,重点是和大伙儿分享自己在 ansible 技术领域积累的一些项目实战经验,如果大家遇到任何问题也欢迎通过留言或者其他方式进行互动,我尽力做到有效回复。

Ansible is Simple IT Automation

更新历史

2020 年 06 月 21 日 - 增加 Mitogen for Ansible
2020 年 06 月 01 日 - 增加基于 Ansible 的自动化运维开源项目
2020 年 01 月 22 日 - 增加 Ansible 参考文章
2018 年 05 月 15 日 - 初稿

阅读原文 - https://wsgzao.github.io/post/ansible/

扩展阅读

ansible - https://docs.ansible.com/


Ansible 标准化学习路径

Ansible 相关的书籍在逐步增多,由于 Ansible 版本迭代更新频率高但学习成本低,个人建议书为辅,官方文档为主

Ansible is an IT automation tool. It can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates.

Ansible’s main goals are simplicity and ease-of-use. It also has a strong focus on security and reliability, featuring a minimum of moving parts, usage of OpenSSH for transport (with other transports and pull modes as alternatives), and a language that is designed around auditability by humans–even those not familiar with the program.

We believe simplicity is relevant to all sizes of environments, so we design for busy users of all types: developers, sysadmins, release engineers, IT managers, and everyone in between. Ansible is appropriate for managing all environments, from small setups with a handful of instances to enterprise environments with many thousands of instances.

Ansible manages machines in an agent-less manner. There is never a question of how to upgrade remote daemons or the problem of not being able to manage systems because daemons are uninstalled. Because OpenSSH is one of the most peer-reviewed open source components, security exposure is greatly reduced. Ansible is decentralized–it relies on your existing OS credentials to control access to remote machines. If needed, Ansible can easily connect with Kerberos, LDAP, and other centralized authentication management systems.

This documentation covers the current released version of Ansible and also some development version features. For recent features, we note in each section the version of Ansible where the feature was added.

Ansible releases a new major release of Ansible approximately every two months. The core application evolves somewhat conservatively, valuing simplicity in language design and setup. However, the community around new modules and plugins being developed and contributed moves very quickly, adding many new modules in each release.

Ansible Lightbulb 新版本是 Red Hat Ansible Automation Platform Workshops

The Ansible Lightbulb project is an effort to provide a content toolkit and educational reference for effectively communicating and teaching Ansible topics.

Ansible Lightbulb - https://github.com/ansible/lightbulb

Red Hat Ansible Automation Platform Workshops - https://ansible.github.io/workshops/

Ansible Documentation 是 Ansible 官方文档,我的建议还是对英文不要害怕,多动手查多敲命令去理解

Ansible Documentation - http://docs.ansible.com/ansible/latest/index.html

如果大家需要使用 Role 推荐阅读 Ansible Best Practices

Ansible Best Practices

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
inventories/
production/
hosts # inventory file for production servers
group_vars/
group1.yml # here we assign variables to particular groups
group2.yml
host_vars/
hostname1.yml # here we assign variables to particular systems
hostname2.yml

staging/
hosts # inventory file for staging environment
group_vars/
group1.yml # here we assign variables to particular groups
group2.yml
host_vars/
stagehost1.yml # here we assign variables to particular systems
stagehost2.yml

library/ # if any custom modules, put them here (optional)
module_utils/ # if any custom module_utils to support modules, put them here (optional)
filter_plugins/ # if any custom filter plugins, put them here (optional)

site.yml # master playbook
webservers.yml # playbook for webserver tier
dbservers.yml # playbook for dbserver tier

files/ # here we assign files for simple plays
plays/ # here we assign plays as the entrance
tasks/ # here we assign tasks for plays to call

roles/
common/ # this hierarchy represents a "role"
tasks/ #
main.yml # <-- tasks file can include smaller files if warranted
handlers/ #
main.yml # <-- handlers file
templates/ # <-- files for use with the template resource
ntp.conf.j2 # <------- templates end in .j2
files/ #
bar.txt # <-- files for use with the copy resource
foo.sh # <-- script files for use with the script resource
vars/ #
main.yml # <-- variables associated with this role
defaults/ #
main.yml # <-- default lower priority variables for this role
meta/ #
main.yml # <-- role dependencies
library/ # roles can also include custom modules
module_utils/ # roles can also include custom module_utils
lookup_plugins/ # or other types of plugins, like lookup in this case

webtier/ # same kind of structure as "common" was above, done for the webtier role
monitoring/ # ""
fooapp/ # ""

提升 Ansible 执行效率的插件

众所周知,Ansible 是基于 ssh(当然还有 telnet,winrm 等连接插件)的自动化配置管理工具,其简单易用,无 agent 式的工作方式在很多场景中都有不少优势,不过也是由于这种工作方式导致了它没有其他 c/s 类的工具执行效率高,饱受其他 C/S 类工具使用者的讥讽,对此,Ansible 官方也对 Ansible 的速度效率做了不少优化手段。

参数名 / 优化类别 说明
fact cache 将 facts 信息第一次收集后缓存到 memory 或者 redis 或者文件中。
gather_subset 可选择性的收集 network,hardware 等信息,而不是全部
control_path 开启 ssh socket 持久化,复用 ssh 连接
pipelinling 开启 ssh pipelining, 客户端从管道中读取执行渲染后的脚本,而不是在客户端创建临时文件
fork 提高并行执行主机的数量
serial play_hosts`①` 中主机再分批执行
strategy 默认 linear, 每个主机的单个 task 执行完成会等待其他都完成后再执行下个任务,设置 free 可不等待其他主机,继续往下执行(看起来会比较乱),还有一个选项 host_pinned,我也不知道干嘛的

无意发现了一个 Mitogen 的 Ansible plugin(strategy plugin),当前已迭代到 0.29 版本,看介绍说能提升 1.2x ~ 7x 以上的执行效率,着实惊人!

它通过高效的远程过程调用来取代 ansible 默认的嵌入式与纯 python shell 调用,它不会优化模块本身的执行效率,只会尽可能快的②去执行模块获取返回 (执行模块前也是有一系列连接,发送数据,传输渲染脚本等操作的) 来提高整体的效率,特性如下

Expect a 1.25x - 7x speedup and a CPU usage reduction of at least 2x, depending on network conditions, modules executed, and time already spent by targets on useful work. Mitogen cannot improve a module once it is executing, it can only ensure the module executes as quickly as possible.

  • One connection is used per target, in addition to one sudo invocation per user account. This is much better than SSH multiplexing combined with pipelining, as significant state can be maintained in RAM between steps, and system logs aren’t spammed with repeat authentication events.

  • A single network roundtrip is used to execute a step whose code already exists in RAM on the target. Eliminating multiplexed SSH channel creation saves 4 ms runtime per 1 ms of network latency for every playbook step.

  • Processes are aggressively reused, avoiding the cost of invoking Python and recompiling imports, saving 300-800 ms for every playbook step.

  • Code is ephemerally cached in RAM, reducing bandwidth usage by an order of magnitude compared to SSH pipelining, with around 5x fewer frames traversing the network in a typical run.

  • Fewer writes to the target filesystem occur. In typical configurations, Ansible repeatedly rewrites and extracts ZIP files to multiple temporary directories on the target. Security issues relating to temporary files in cross-account scenarios are entirely avoided.

The effect is most potent on playbooks that execute many short-lived actions, where Ansible’s overhead dominates the cost of the operation, for example when executing large with_items loops to run simple commands or write files.

大体就是执行过程中主机使用一个连接(默认每执行一个 task 或者 loop 循环都会重新打开一次连接的);渲染的执行代码暂存于内存中;减少多路复用 ssh 隧道的时间消耗;减少临时文件传输的带宽;代码重用,避免代码的重新编译成本等

实现原理的话,可以去看下官网解释,反正我是没怎么看懂

① . play_hosts 为内置参数,指当前正在执行的 playbook 中的主机列表

②. 尽可能快的 指到运行模块前的阶段

  1. Download and extract mitogen-0.2.9.tar.gz
  2. Modify ansible.cfg
1
2
3
[defaults]
strategy_plugins = /path/to/mitogen-0.2.9/ansible_mitogen/plugins/strategy
strategy = mitogen_linear

The strategy key is optional. If omitted, the ANSIBLE_STRATEGY=mitogen_linear environment variable can be set on a per-run basis. Like mitogen_linear, the mitogen_free and mitogen_host_pinned strategies exists to mimic the free and host_pinned strategies.

https://networkgenomics.com/ansible/

https://mitogen.networkgenomics.com/ansible_detailed.html

基于 Ansible 的开源项目

第一个是 ansible 官方开源项目,其他都是和 ansible 相关的运维平台开源项目,推荐学习和参考

Ansible - https://github.com/ansible/ansible

Jumpserver - http://www.jumpserver.org/

OpsManage - https://github.com/welliamcao/OpsManage

spug - https://github.com/openspug/spug

BigOps - http://www.bigops.com/

Ansible 项目实践

以下内容来自于《基于 ansible 的主机自动化配置管理》项目,基于 ansible 目前可以满足生产环境所有基线要求,相信对大家有一定的参考价值

ansible 部署

因为生产环境为内外网物理隔离,所有的安装部署都是离线进行的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# Install Packages
yum install gcc zlib zlib-devel openssl-devel -y

# Install Python
tar xf Python-2.7.14.tgz
cd Python-2.7.14
./configure
make
make install
cd ..

# renew python env
exit

# ImportError: No module named six.moves
tar xf six-1.11.0.tar.gz
cd six-1.11.0
python setup.py install
cd ..

# ImportError: No module named packaging.version
tar xf packaging-17.1.tar.gz
cd packaging-17.1
python setup.py install
cd ..

# ImportError: No module named pyparsing
tar xf pyparsing-2.2.0.tar.gz
cd pyparsing-2.2.0
python setup.py install
cd ..

# ImportError: No module named appdirs
tar xf appdirs-1.4.3.tar.gz
cd appdirs-1.4.3
python setup.py install
cd ..

# Install Setuptools
unzip setuptools-38.5.2.zip
cd setuptools-38.5.2
python setup.py install
cd ..

# Install pip
tar xf pip-9.0.1.tar.gz
cd pip-9.0.1
python setup.py install
cd ..

# pip 离线下载
# pip download -d DIR -r requirements.txt
pip download -d ~/ansible/ ansible

# pip 离线安装
# pip install --no-index --find-links=DIR -r requirements.txt
pip install --no-index --find-links=pip-ansible-2.3.3/ -r requirements.txt
pip install --no-index --find-links=pip-ansible-2.5.0/ -r requirements.txt -U

# pip 离线安装 pipenv
pip install --no-index --find-links=pip-pipenv/ pipenv

# 使用 pipenv 创建虚拟环境
mkdir win_ansible
cd win_ansible
pipenv shell
pip install --no-index --find-links=pip-ansible-2.5.2/ -r requirements.txt

ansible.cfg 配置解析

ansible.cfg 不影响执行结果但合理的配置会有效提升效率

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 配置文件路径(优先级)
./ansible.cfg
/etc/ansible/ansible.cfg

# 配置文件内容
[defaults]
#inventory = /etc/ansible/hosts
#log_path = /var/log/ansible.log
forks = 100 # 设置并发数
host_key_checking = False # 不检查 SSH 主机登录的密钥
display_skipped_hosts = False # 不显示已跳过的主机
retry_files_enabled = False # 不创建任务失败后的重试文件
# 按照 1d 设置 setup 缓存,优化执行效率
gathering = smart
fact_caching_timeout = 86400
fact_caching = jsonfile
fact_caching_connection = cachedir

Linux

  • 服务端操作系统:RHEL 6/7(Windows 不可作为控制端)
  • 服务端 Python 版本:2.7.14(实测安装完成无需额外调整)
  • Ansible 版本:2.3.3.0(实测 2.4 以上版本已不支持 rhel5.5,客户端需 simplejson)
  • 管理对象:目前主要针对 RHEL 5/6/7(Windows 使用高版本 Ansible)
  • 基线标准:参考《主机岗配置基线 v1.1.xlsx》

服务端

  • 操作系统版本:RHEL 6/7
  • Python 版本:2.7.14
  • 安装方式:pip 离线安装依赖包

客户端

  • 操作系统版本:RHEL 5/6/7
  • 非最小模式安装无需做调整
  • RHEL5.5 需要安装 simplejson

核心用法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 检测 ansible 是否可以正常访问主机
ansible-playbook -i hosts playbooks/ping.yml -v
# 配置好 inventory,执行以下命令创建用户并建立信任关系
ansible-playbook -i hosts playbooks/user/default.yml -v
# 配置时间同步 / 进程服务 / 基线文件
ansible-playbook -i hosts playbooks/baseline/cfgset.yml -v
ansible-playbook -i hosts playbooks/baseline/cfgset.yml -v --tags="repo"
ansible-playbook -i hosts playbooks/baseline/cfgset.yml -v --skip-tags="ntp,repo"
# 更新系统软件包和补丁包
ansible-playbook -i hosts playbooks/baseline/pakset.yml -v
# 修改用户密码
ansible-playbook -i hosts_changepw playbooks/user/changepw.yml -v -e "@userpass.json"
# 备份配置,支持自定义日期命名,默认为 "%Y%m%d"
ansible-playbook -i hosts backup/backup.yml -v
# 恢复配置,支持按日期目录全局或者局部主机恢复
ansible-playbook -i hosts backup/restore.yml -v -e "var_backup_date=20180305"

Windows

  • 服务端操作系统:RHEL 6/7(Windows 不可作为控制端)
  • 服务端 Python 版本:2.7.14(实测安装完成无需额外调整)
  • Ansible 版本:2.5.0(Windows 原生模块支持需要持续更新 Ansible 新版本)
  • 管理对象:目前主要针对 Windows 7/2008/2012(不支持 xp/2003)
  • 基线标准:参考《Windows 安全基线》

服务端

  • 操作系统版本:RHEL 6/7
  • Python 版本:2.7.14
  • 安装方式:pip 离线安装依赖包(目前使用 pipenv 切换管理 Linux 和 Windows)

客户端

  • 操作系统版本:Window 7/2008/2012
  • WinRM(Windows 7/2008 需要升级至 Powershell v3.0)

核心用法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 检测 ansible 是否可以正常访问主机
ansible-playbook -i hosts win_playbooks/ping.yml -v
# 配置好 inventory,执行以下命令创建用户并建立信任关系
ansible-playbook -i hosts win_playbooks/user/default.yml -v
# 配置时间同步 / 进程服务 / 基线文件
ansible-playbook -i hosts win_playbooks/baseline/cfgset.yml -v
ansible-playbook -i hosts win_playbooks/baseline/cfgset.yml -v --tags="wsus"
ansible-playbook -i hosts win_playbooks/baseline/cfgset.yml -v --skip-tags="ntp,wsus"
# 更新系统软件包和补丁包
ansible-playbook -i hosts win_playbooks/baseline/pakset.yml -v
# 修改用户密码
ansible-playbook -i win_hosts_changepw win_playbooks/user/changepw.yml -v -e "@userpass.json"
# 备份配置,支持自定义日期命名,默认为 "%Y%m%d"
ansible-playbook -i win_hosts win_backup/backup.yml -v
# 恢复配置,支持按日期目录全局或者局部主机恢复
ansible-playbook -i win_hosts win_backup/restore.yml -v -e "var_backup_date=20180305"

结语

很抱歉我暂时不能分享全部信息,但是这并不妨碍技术上的交流,我会逐步分享有价值的可公开代码

  1. 遵循 what/why/how 思路,要理解 ansible 能解决什么问题,为什么选择 ansible,怎么使用 ansible 去解决
  2. Ansible 学习成本低但不等同于没有难度,学习路径推荐参考官方文档并积极实践,官网没有答案要善用 Google 搜索
  3. Ansible 纯后台模式只解决了部分问题,还有更多需求要通过基于 Ansible 的自动化运维平台来实现,拥抱开源技术不能固步自封

参考文章

Ansible Documentation

ansible-workshops

Ansible 入门指南 - 学习总结

这样理解 Ansible 更容易

前世今生:1 小时学会 Ansible

Ansible 专题文章总揽

朱双印 - ansible 系列

骏马金龙 - ansible

B 站视频 - ansible 教程 - 马哥 2019 全新 ansible 入门到精通

Ansible 日常使用技巧 - 运维总结

Ansible– 奇淫技巧

文章目录
  1. 1. 前言
  2. 2. 更新历史
  3. 3. Ansible 标准化学习路径
  4. 4. 提升 Ansible 执行效率的插件
  5. 5. 基于 Ansible 的开源项目
  6. 6. Ansible 项目实践
    1. 6.1. ansible 部署
    2. 6.2. ansible.cfg 配置解析
    3. 6.3. Linux
    4. 6.4. Windows
  7. 7. 结语
  8. 8. 参考文章