node exporter的時間同步檢查機制

前兩天國內的NTP server群突然全體無法連線，導致k8s cluster集體報錯。

prometheus rule是

name: HostClockNotSynchronising
expr: min_over_time(node_timex_sync_status[1m]) == 0 and node_timex_maxerror_seconds >= 16
for: 2m
labels:
  severity: warning
annotations:
  description: Clock not synchronising.
    VALUE = {{ $value }}
    LABELS = {{ $labels }}
  summary: Host clock not synchronising (instance {{ $labels.instance }})

由node exporter提供 timex.go

實作是call了linux上的adjtimex

go package c function

用來檢查ntp daemon是否和server保持同步

在systemd的系統上是systemd-timesyncd管理

解決方法：如果不緊急的話等待server復原，或是主動更換ntp server pool address

文章訊息

作者：Jia Jun Yeh
連結：https://xnum.github.io/2021/10/unix-timex/
本著作係採用創用 CC 姓名標示-非商業性-相同方式分享 3.0 台灣授權條款授權.

xnum's blog

node exporter的時間同步檢查機制

文章訊息

Search

Table of Contents