Ceph cluster down, Reason OSD Full - not starting up

Question

Cephadm Pacific v16.2.7 Our Ceph cluster is stuck pgs degraded and osd are down Reason:- OSD's got filled up

Things we tried

Changed vale to to maximum possible combination (not sure if done right ?) backfillfull < nearfull, nearfull < full, and full < failsafe_full

ceph-objectstore-tool - tried to delte some pgs to recover space

tried to mount osd and delete pg's to recover some space, but not sure how to do it in bluestore .

Global Recovery Event - stuck for ever


ceph -s 


cluster:
    id:     a089a4b8-2691-11ec-849f-07cde9cd0b53
    health: HEALTH_WARN
            6 failed cephadm daemon(s)
            1 hosts fail cephadm check
            Reduced data availability: 362 pgs inactive, 6 pgs down, 287 pgs peering, 48 pgs stale
            Degraded data redundancy: 5756984/22174447 objects degraded (25.962%), 91 pgs degraded, 84 pgs undersized
            13 daemons have recently crashed
            3 slow ops, oldest one blocked for 31 sec, daemons [mon.raspi4-8g-18,mon.raspi4-8g-20] have slow ops.

  services:
    mon: 5 daemons, quorum raspi4-8g-20,raspi4-8g-25,raspi4-8g-18,raspi4-8g-10,raspi4-4g-23 (age 2s)
    mgr: raspi4-8g-18.slyftn(active, since 3h), standbys: raspi4-8g-12.xuuxmp, raspi4-8g-10.udbcyy
    osd: 19 osds: 15 up (since 2h), 15 in (since 2h); 6 remapped pgs

  data:
    pools:   40 pools, 636 pgs
    objects: 4.28M objects, 4.9 TiB
    usage:   6.1 TiB used, 45 TiB / 51 TiB avail
    pgs:     56.918% pgs not active
             5756984/22174447 objects degraded (25.962%)
             2914/22174447 objects misplaced (0.013%)
             253 peering
             218 active+clean
             57  undersized+degraded+peered
             25  stale+peering
             20  stale+active+clean
             19  active+recovery_wait+undersized+degraded+remapped
             10  active+recovery_wait+degraded
             7   remapped+peering
             7   activating
             6   down
             2   active+undersized+remapped
             2   stale+remapped+peering
             2   undersized+remapped+peered
             2   activating+degraded
             1   active+remapped+backfill_wait
             1   active+recovering+undersized+degraded+remapped
             1   undersized+peered
             1   active+clean+scrubbing+deep
             1   active+undersized+degraded+remapped+backfill_wait
             1   stale+active+recovery_wait+undersized+degraded+remapped

  progress:
    Global Recovery Event (2h)
      [==========..................] (remaining: 4h)


'''

That's one of the worst scenarios, never let your cluster become (near)full. That's why you get warned at around 85% (default). The problem at this point is, even if you add more OSDs the remaining OSDs need some space for the pg remapping. But I don't see another option than adding more storage and try to move PGs manually to the new OSDs if Ceph can't handle that on its own. — eblock, Feb 10 '22 at 13:47
yeah we added ssds as a new crush rule it was only 4 ssds, but most important data was on it , it all happened overnight :( looking for ways to delete some pgs in osds and bring back the osd online, and add some ssds . Thank you though — Jesvin C Joachim, Feb 10 '22 at 15:10

score 0 · Answer 1 · answered Feb 13 '22 at 12:44

Some versions of BlueStore were susceptible to BlueFS log growing extremely large - beyond the point of making booting OSD impossible. This state is indicated by booting that takes very long and fails in _replay function.

This can be fixed by:: ceph-bluestore-tool fsck –path osd path –bluefs_replay_recovery=true

It is advised to first check if rescue process would be successful:: ceph-bluestore-tool fsck –path osd path –bluefs_replay_recovery=true –bluefs_replay_recovery_disable_compact=true

If above fsck is successful fix procedure can be applied

Special Thank you to, this has been solved with the help of a dewDrive Cloud backup faculty Member

Ceph cluster down, Reason OSD Full - not starting up

1 Answers1