CEPH故障以其处理方法

1. Slow OSD heartbeats

# ceph -s
health: HEALTH_WARN
       Slow OSD heartbeats on back (longest 6181.010ms)
       Slow OSD heartbeats on front (longest 5953.232ms)

OSDs之间会相互测试(ping)访问速度,若两个OSDs之间的连接延迟高于1s,则表示OSDs之间的延迟太高,不利于CEPH集群的数据存储和访问。两个OSDs之间可以通过内网(存储服务器之间 / back)检测其延迟,也可以通过外网(存储服务器到使用服务器 / front)检测其延迟。若延迟过高,会将相应的OSDs down掉,进而可能导致CEPH数据丢失。

一般情况下OSDs之间延迟高的原因是因为网络原因导致的。可能是某台存储服务器重启网络导致,或网线出问题导致。前者的时间会逐渐变小,最后恢复正常,后者则问题一直存在。通过查看详细的OSDs延迟信息查找延迟较高的主机,再进行解决。

# ceph health detail

[WRN] OSD_SLOW_PING_TIME_BACK: Slow OSD heartbeats on back (longest 11846.602ms)
    Slow OSD heartbeats on back from osd.12 [] to osd.25 [] 11846.602 msec
    Slow OSD heartbeats on back from osd.8 [] to osd.17 [] 3617.281 msec
    Slow OSD heartbeats on back from osd.16 [] to osd.27 [] 2784.517 msec
    Slow OSD heartbeats on back from osd.21 [] to osd.17 [] 1678.064 msec
    Slow OSD heartbeats on back from osd.11 [] to osd.15 [] 1675.884 msec
    Slow OSD heartbeats on back from osd.20 [] to osd.13 [] 1073.790 msec
[WRN] OSD_SLOW_PING_TIME_FRONT: Slow OSD heartbeats on front (longest 11427.677ms)
    Slow OSD heartbeats on front from osd.12 [] to osd.25 [] 11427.677 msec
    Slow OSD heartbeats on front from osd.8 [] to osd.17 [] 3787.868 msec
    Slow OSD heartbeats on front from osd.16 [] to osd.27 [] 3465.298 msec
    Slow OSD heartbeats on front from osd.11 [] to osd.15 [] 1469.591 msec
    Slow OSD heartbeats on front from osd.21 [] to osd.17 [] 1341.135 msec
    Slow OSD heartbeats on front from osd.20 [] to osd.13 [] 1224.235 msec
    Slow OSD heartbeats on front from osd.5 [] to osd.16 [] 1101.175 msec

通过以上信息查看,可以发现有一台主机和其它主机的OSDs延迟都比较高,将该主机的光纤网线拔下擦拭干净并重新插上得以解决。

2. slow ops

# ceph -s
     21 slow ops, oldest one blocked for 29972 sec, mon.ceph1 has slow ops

先保证所有存储服务器上的时间同步一致,再重启相应主机上的moniter服务解决。

3. pgs not deep-scrubbed in time

# ceph -s
    47 pgs not deep-scrubbed in time

应该是OSDs掉线后,CEPH自动进行数据恢复。再将相应的OSDs重新加入后,则需要将恢复的数据再擦除掉。于是提示相应的警告信息,正在进行删除相关的操作,且其pgs的数量会不断变少。等待一段时间后,则恢复正常,此时ceph文件系统性能很差。

4. MDS cache is too large

ceph config set mds mds_cache_memory_limit 10GB

ceph config dump

当MDS使用的缓存过高,比设定的阈值高很多时,则有此警告信息。使用如上命令设置更高的MDS缓存阈值,即可消除次警告信息,但会消耗更多的内存。使用config dump命令可以查看各项参数阈值信息。

此外,可能增大了mds_cache_memory_limit参数后,过了一段时间后仍然提示该警告,检测发现MDS缓存使用又超过新设定值的1.5倍大小了。此时,可以考虑设置多个活动状态的MDS服务。

# 先开启3台服务器的MDS服务,确保这3台服务器的内存是够用的,最好这3台服务器的内存更大。
ceph orch apply mds cephfs ceph106,ceph107,ceph109
ceph fs set cephfs max_mds 3

# 由于激活了3台服务器的MDS,缺少备用的MDS服务。再增加一个备用的MDS服务主机。
ceph orch apply mds cephfs ceph106,ceph107,ceph109,ceph110

5. Client node18 failing to respond to cache pressure

表示node18主机和MDS服务之前的响应较慢,若过一会儿就显示health_ok,则不用管它。若是长期显示该警告,则在对应的node18主机上卸载ceph文件系统后重新挂载即可。

6. Reduced data availability: 4 pgs inactive, 4 pgs incomplete

当有pgs出现incomplete时,表明pgs对应的OSDs存活数量少于最小副本数。因此,其对应的数据无法读写,处于reduced状态,会导致MDS服务出问题,提示如下报错信息,示例:

3 MDSs report slow metadata IOs
2 MDSs report slow requests
2 MDSs behind on trimming
Reduced data availability: 4 pgs inactive, 4 pgs incomplete

pg 5.6de is incomplete, acting [254,356,222,352,111,247,100,133,351,206] (reducing pool cephfs_data min_size from 8 may help; search ceph.com/docs for 'incomplete')
pg 5.6e9 is incomplete, acting [276,244,357,358,221,321,311,229,314,351] (reducing pool cephfs_data min_size from 8 may help; search ceph.com/docs for 'incomplete')
pg 5.73b is incomplete, acting [186,279,351,247,293,354,359,220,181,283] (reducing pool cephfs_data min_size from 8 may help; search ceph.com/docs for 'incomplete')
pg 5.eda is incomplete, acting [164,157,120,227,353,351,295,269,95,354] (reducing pool cephfs_data min_size from 8 may help; search ceph.com/docs for 'incomplete')

此时,需要修复pgs。

# 查询pg信息(pg id 为 5.6de)
ceph pg 5.6de query

# 强行重建pg
ceph osd force-create-pg 5.6de --yes-i-really-mean-it

发表评论

电子邮件地址不会被公开。 必填项已用*标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据