How to use ceph to store large amount of small data

Question

I set up a cephfs cluster on my virtual machine, and then want to use this cluster to store a batch of image data (total 1.4G, each image is about 8KB). The cluster stores two copies, with a total of 12G of available space. But when I store data inside, the system prompts that the available space is insufficient. How to solve this?The details of the cluster are as follows:

Cluster Information:

cluster:

id:     891fb1a7-df35-48a1-9b5c-c21d768d129b

health: HEALTH_ERR

        1 MDSs report slow metadata IOs

        1 MDSs report slow requests

        1 full osd(s)

        1 nearfull osd(s)

        2 pool(s) full

        Degraded data redundancy: 46744/127654 objects degraded (36.618%), 204 pgs degraded

        Degraded data redundancy (low space): 204 pgs recovery_toofull

        too many PGs per OSD (256 > max 250)

        clock skew detected on mon.node2, mon.node3

services:

mon: 3 daemons, quorum node1,node2,node3

mgr: node2(active), standbys: node1, node3

mds: cephfs-1/1/1 up  {0=node1=up:active}, 2 up:standby

osd: 3 osds: 2 up, 2 in

data:

pools:   2 pools, 256 pgs

objects: 63.83k objects, 543MiB

usage:   10.6GiB used, 1.40GiB / 12GiB avail

pgs:     46744/127654 objects degraded (36.618%)

         204 active+recovery_toofull+degraded

         52  active+clean

Cephfs Space Usage:

[root@node1 0]# df -hT 文件系统类型容量已用可用已用% 挂载点

/dev/mapper/nlas-root xfs 36G 22G 14G 62% /

devtmpfs devtmpfs 2.3G 0 2.3G 0% /dev

tmpfs tmpfs 2.3G 0 2.3G 0%

/dev/shm

tmpfs tmpfs 2.3G 8.7M 2.3G 1% /run

tmpfs tmpfs 2.3G 0 2.3G 0%

/sys/fs/cgroup

/dev/sda1 xfs 1014M 178M 837M 18% /boot

tmpfs tmpfs 2.3G 28K 2.3G 1%

/var/lib/ceph/osd/ceph-0

tmpfs tmpfs 471M 0 471M 0%

/run/user/0

192.168.152.3:6789,192.168.152.4:6789,192.168.152.5:6789:/ ceph 12G 11G 1.5G 89% /mnt/test

Ceph OSD:

[root@node1 mnt]# ceph osd pool ls

cephfs_data

cephfs_metadata

[root@node1 mnt]# ceph osd pool get cephfs_data size

size: 2

[root@node1 mnt]# ceph osd pool get cephfs_metadata size

size: 2

ceph.dir.layout:

[root@node1 mnt]# getfattr -n ceph.dir.layout /mnt/test

getfattr: Removing leading '/' from absolute path names

# file: mnt/test

ceph.dir.layout="stripe_unit=65536 stripe_count=1 object_size=4194304 pool=cephfs_data"

You're probably suffering from `bluestore_min_alloc_size_hdd = 64k` (default for HDDs). So if you store lots of 8k images but each object uses 64k you have quite a lot of overhead. This should be taken into consideration when planning a ceph cluster. You can rebuild your OSDs with a smaller allocation size if you know your work load that well. — eblock, Nov 25 '21 at 13:54

score 0 · Answer 1 · answered Nov 25 '21 at 19:08

Storing small files, you need to watch the minimum allocation size. Until the Nautilus release, this defaulted to 16k for SSD and 64k for HDD, but with the new Ceph Pacific the default minimum allocation has been tuned to 4k for both.

I suggest you use Pacific, or manually tune Octopus to the same numbers if that's the version you installed.

You also want to use replication (as opposed to Erasure Coding) if your files are under a multiple of the minimum allocation size, as the chunks of EC would use the same minimum allocation and will waste slack space otherwise. You already made the right choice here by using replication, I am just mentioning it here because you may be tempted by EC's touted space-saving properties -- which unfortunately do not apply to small files.

score 0 · Answer 2 · answered Nov 28 '21 at 13:53

0

you need to set bluestore_min_alloc_size to 4096 by default its value is 64kb

[osd]
    bluestore_min_alloc_size = 4096
    bluestore_min_alloc_size_hdd = 4096
    bluestore_min_alloc_size_ssd = 4096

answered Nov 28 '21 at 13:53

Hackaholic

19,069
5
54
72

How to use ceph to store large amount of small data

2 Answers2