使用 Ansible 部署 K8s 集群



sudo -i

password="root2020"
echo -e "$password\n$password" |sudo passwd root

echo 'PermitRootLogin  yes' >> /etc/ssh/sshd_config
sudo service  ssh restart


apt-get update && apt-get upgrade -y && apt-get dist-upgrade -y

 
apt-get install  -y python2.7 python-pip 

pip install pip --upgrade -i https://mirrors.aliyun.com/pypi/simple/
pip install ansible==2.6.18 netaddr==0.7.19 -i https://mirrors.aliyun.com/pypi/simple/


ssh-keygen 

ssh-copy-id $ip 


# 下载工具脚本easzup,举例使用kubeasz版本2.0.2
export release=2.2.1
curl -C- -fLO --retry 3 https://github.com/easzlab/kubeasz/releases/download/${release}/easzup
chmod +x ./easzup
# 使用工具脚本下载
./easzup -D





# 分步安装
ansible-playbook 01.prepare.yml
ansible-playbook 02.etcd.yml
ansible-playbook 03.docker.yml
ansible-playbook 04.kube-master.yml
ansible-playbook 05.kube-node.yml
ansible-playbook 06.network.yml
ansible-playbook 07.cluster-addon.yml
# 一步安装
#ansible-playbook 90.setup.yml

如果提示kubectl: command not found,退出重新ssh登录一下,环境变量生效即可

kubectl version         # 验证集群版本     
kubectl get node        # 验证节点就绪 (Ready) 状态
kubectl get pod -A      # 验证集群pod状态,默认已安装网络插件、coredns、metrics-server等
kubectl get svc -A      # 验证集群服务状态

重启所有组件

systemctl restart etcd && systemctl status etcd

systemctl restart flanneld && systemctl status flanneld

systemctl restart docker && systemctl status docker

systemctl stop nginx && systemctl start nginx && systemctl status nginx

systemctl restart keepalived && systemctl status keepalived

systemctl restart kube-apiserver && systemctl status kube-apiserver

systemctl restart kube-controller-manager && systemctl status kube-controller-manager

systemctl restart kube-scheduler && systemctl status kube-scheduler

systemctl restart kubelet && systemctl status kubelet

systemctl restart kube-proxy && systemctl status kube-proxy

清理

以上步骤创建的K8S开发测试环境请尽情折腾,碰到错误尽量通过查看日志、上网搜索、提交issues等方式解决;当然你也可以清理集群后重新创建。

在宿主机上,按照如下步骤清理

清理集群 docker exec -it kubeasz easzctl destroy
清理运行的容器 ./easzup -C
清理容器镜像 docker system prune -a
停止docker服务 systemctl stop docker
删除docker文件
 umount /var/run/docker/netns/default
 umount /var/lib/docker/overlay
 rm -rf /var/lib/docker /var/run/docker
上述清理脚本执行成功后,建议重启节点,以确保清理残留的虚拟网卡、路由等信息。

其他


master:

sudo ln -s /etc/kubernetes/kubelet.kubeconfig /etc/kubernetes/kubelet.conf

ln -sf /etc/kubernetes/kube-controller-manager.kubeconfig /etc/kubernetes/bootstrap-kubelet.conf

ln -s /etc/kubernetes/kube-controller-manager.kubeconfig /etc/kubernetes/admin.conf


###  stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory 


nano /etc/systemd/system/kubelet.service.d/10-kubeadm.conf 

发现找不到 /etc/kubernetes/kubelet.conf 文件, 添加软连接

sudo ln -s /etc/kubernetes/kubelet.kubeconfig /etc/kubernetes/kubelet.conf


systemctl status kubelet.service
journalctl -xefu kubelet 


sudo ln -s /etc/kubernetes/kubelet.kubeconfig /etc/kubernetes/bootstrap-kubelet.conf

systemctl status kubelet.service|grep Active

rm /usr/bin/kubelet

cp bin/kubelet /usr/bin/


/opt/kube/bin/kubectl get node 192.168.70.151

cp bin/kubectl /opt/kube/bin/kubectl

scp bin/kubectl 192.168.70.152:/opt/kube/bin/kubectl

nano  roles/kube-master/defaults/main.yml 


cat /etc/kubernetes/ssl/basic-auth.csv


centos/redhat7.5

高可用集群所需节点配置如下

角色 数量 描述
管理节点 1 运行ansible/easzctl脚本,一般复用master节点
etcd节点 3 注意etcd集群需要1,3,5,7…奇数个节点,一般复用master节点
master节点 2 高可用集群至少2个master节点
node节点 3 运行应用负载的节点,可根据需要提升机器配置/增加节点数
  • 系统升级

所有节点

# 文档中脚本默认均以root用户执行
yum update -y
# 安装python
yum install python git  -y
  • Linux Kernel 升级
# 载入公钥
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
# 安装ELRepo
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
# 载入elrepo-kernel元数据
yum --disablerepo=\* --enablerepo=elrepo-kernel repolist
# 查看可用的rpm包
yum --disablerepo=\* --enablerepo=elrepo-kernel list kernel*
# 安装长期支持版本的kernel
yum --disablerepo=\* --enablerepo=elrepo-kernel install -y kernel-lt.x86_64
# 删除旧版本工具包
yum remove kernel-tools-libs.x86_64 kernel-tools.x86_64 -y
# 安装新版本工具包
yum --disablerepo=\* --enablerepo=elrepo-kernel install -y kernel-lt-tools.x86_64

#查看默认启动顺序
awk -F\' '$1=="menuentry " {print $2}' /etc/grub2.cfg  
CentOS Linux (4.4.183-1.el7.elrepo.x86_64) 7 (Core)  
CentOS Linux (3.10.0-327.10.1.el7.x86_64) 7 (Core)  
CentOS Linux (0-rescue-c52097a1078c403da03b8eddeac5080b) 7 (Core)
#默认启动的顺序是从0开始,新内核是从头插入(目前位置在0,而4.4.4的是在1),所以需要选择0。
grub2-set-default 0  
#重启并检查
reboot
  • 复制 sshkey 到多服务器

生成ssh-key: ssh-kengen


password=" "
for i in 170 172 174
do
expect << EOF
spawn ssh-copy-id 192.168.20.$i
expect "(yes/no)?" {send "yes\r"}
expect "password:" {send "$password\r"}
expect "#" {send "exit\r"}
alias s$i="ssh '192.168.20.$i'"
EOF
done

Node节点

yum install haproxy
  • 下载离线安装包

推荐使用 easzup 脚本下载 4.0/4.1/4.2 所需文件;运行成功后,所有文件(kubeasz代码、二进制、离线镜像)均已整理好放入目录/etc/ansible


export release=2.2.1
curl -C- -fLO --retry 3 https://github.com/easzlab/kubeasz/releases/download/${release}/easzup
chmod +x ./easzup

# 举例使用 k8s 版本 v1.18.2,docker 19.03.5
./easzup -D -d 19.03.5 -k v1.18.2
# 下载离线系统软件包
./easzup -P

执行成功后,所有文件均已整理好放入目录/etc/ansible,只要把该目录整体复制到任何离线的机器上,即可开始安装集群,离线文件包括:

  • kubeasz 项目代码 –> /etc/ansible
  • kubernetes 集群组件二进制 –> /etc/ansible/bin
  • 其他集群组件二进制(etcd/CNI等)–> /etc/ansible/bin
  • 操作系统基础依赖软件包(haproxy/ipvsadm/ipset/socat等)–> /etc/ansible/down/packages
  • 集群基本插件镜像(coredns/dashboard/metrics-server等)–> /etc/ansible/down 离线文件不包括:
  • 管理端 ansible 安装,但可以使用 kubeasz 容器运行 ansible 脚本
  • 其他更多 kubernetes 插件镜像

离线安装 上述下载完成后,把/etc/ansible整个目录复制到目标离线服务器相同目录,然后在离线服务器上运行:

# 离线安装 docker,检查本地文件,正常会提示所有文件已经下载完成
./easzup -D

# 启动 kubeasz 容器
#./easzup -S

# 设置参数,启用离线安装
sed -i 's/^INSTALL_SOURCE.*$/INSTALL_SOURCE: "offline"/g' /etc/ansible/roles/chrony/defaults/main.yml
sed -i 's/^INSTALL_SOURCE.*$/INSTALL_SOURCE: "offline"/g' /etc/ansible/roles/ex-lb/defaults/main.yml
sed -i 's/^INSTALL_SOURCE.*$/INSTALL_SOURCE: "offline"/g' /etc/ansible/roles/kube-node/defaults/main.yml
sed -i 's/^INSTALL_SOURCE.*$/INSTALL_SOURCE: "offline"/g' /etc/ansible/roles/prepare/defaults/main.yml

# 进入容器执行安装,参考 https://github.com/easzlab/kubeasz/blob/master/docs/setup/quickStart.md
#docker exec -it kubeasz easzctl start-aio

  • 主节点安装 ansible
yum install  epel-release -y
yum install  python-pip  -y
# pip安装ansible(国内如果安装太慢可以直接用pip阿里云加速)
pip install pip --upgrade -i https://mirrors.aliyun.com/pypi/simple/
pip install ansible==2.6.18 netaddr==0.7.19 -i https://mirrors.aliyun.com/pypi/simple/

  • 配置集群参数
cd /etc/ansible && cp example/hosts.multi-node hosts

验证ansible 安装:

ansible all -m ping 

正常能看到节点返回 SUCCESS

  • 安装

    ansible-playbook 90.setup.yml

  • 其他命令

kubectl cluster-info  # 查看集群信息
kubectl get nodes -o wide # 查看节点
kubectl get pods --all-namespaces # 查看pods列表
kubectl describe pods coredns-65dbdb44db-w2lb9 -n kube-system  # 查看pod的详细信息

  • helm 安装
wget https://get.helm.sh/helm-v3.4.1-linux-amd64.tar.gz
tar -xf helm-v3.4.1-linux-amd64.tar.gz

mv ./linux-amd64/helm /usr/bin

helm repo add stable https://charts.helm.sh/stable

  • prometheus
cd /etc/ansible/manifests/prometheus

rm -rf prometheus grafana

helm fetch --untar stable/prometheus

helm fetch --untar stable/grafana

helm install  monitor  --namespace monitoring  -f prom-settings.yaml  -f prom-alertsmanager.yaml  -f prom-alertrules.yaml  prometheus

helm install  grafana   --namespace monitoring    -f grafana-settings.yaml    -f grafana-dashboards.yaml    grafana

kubectl get pod,svc -n monitoring 

  • ansible hosts
# 'etcd' cluster should have odd member(s) (1,3,5,...)
# variable 'NODE_NAME' is the distinct name of a member in 'etcd' cluster
[etcd]
192.168.20.211 NODE_NAME=etcd1
192.168.20.212 NODE_NAME=etcd2
192.168.20.213 NODE_NAME=etcd3

# master node(s)
[kube-master]
192.168.20.211
192.168.20.212

# work node(s)
[kube-node]
192.168.20.213
192.168.20.214
192.168.20.215

# [optional] harbor server, a private docker registry
# 'NEW_INSTALL': 'yes' to install a harbor server; 'no' to integrate with existed one
# 'SELF_SIGNED_CERT': 'no' you need put files of certificates named harbor.pem and harbor-key.pem in directory 'down'
[harbor]
#192.168.20.218 HARBOR_DOMAIN="harbor.yourdomain.com" NEW_INSTALL=no SELF_SIGNED_CERT=yes

# [optional] loadbalance for accessing k8s from outside
[ex-lb]
#192.168.20.216 LB_ROLE=backup EX_APISERVER_VIP=192.168.20.250 EX_APISERVER_PORT=8443
192.168.20.216 LB_ROLE=master EX_APISERVER_VIP=192.168.20.250 EX_APISERVER_PORT=8443

# [optional] ntp server for the cluster
[chrony]
192.168.20.211

[all:vars]
# --------- Main Variables ---------------
# Cluster container-runtime supported: docker, containerd
CONTAINER_RUNTIME="docker"

# Network plugins supported: calico, flannel, kube-router, cilium, kube-ovn
CLUSTER_NETWORK="flannel"

# Service proxy mode of kube-proxy: 'iptables' or 'ipvs'
PROXY_MODE="ipvs"

# K8S Service CIDR, not overlap with node(host) networking
SERVICE_CIDR="10.68.0.0/16"

# Cluster CIDR (Pod CIDR), not overlap with node(host) networking
CLUSTER_CIDR="172.20.0.0/16"

# NodePort Range
NODE_PORT_RANGE="20000-40000"

# Cluster DNS Domain
CLUSTER_DNS_DOMAIN="k8s.em."

# -------- Additional Variables (don't change the default value right now) ---
# Binaries Directory
bin_dir="/opt/kube/bin"

# CA and other components cert/key Directory
ca_dir="/etc/kubernetes/ssl"

# Deploy Directory (kubeasz workspace)
base_dir="/etc/ansible"