Ceph : An overview

  • Scale your operations and move to market faster.
  • Bridge the gaps between application development and data science.
  • Gain deeper insights into your data.
  • File, block and Object storage in the same wrapper.
  • Better transfer speed and lower latency
  • Easily accessible storage that can quickly scal up or down .

1. Components

1.1 Cluster components

A CEPH cluster works with the following components:

  • OSD (Object Storage Daemon) — Each Ceph storage node runs one or more Ceph OSD daemons (one per disk). The OSD is a dameon that does all data storage, replication and data recovery operations. The file systems commonly used are XFS, btrfs and ext4.
  • Monitor — Ceph Monitor is the daemon responsible for maintaining a master copy of the cluster map. The Ceph cluster needs a minimum quorum of 3 or more to ensure high availability .
  • Rados Gateway –The rados gateway delivers an api service and it connect via S3 or Swift directly with Ceph.
  • Metadata Server — MDS handles all file operations and uses RADOS objects to store data and file system attributes. It can be scaled horizontally by adding more Ceph Metadata Servers to support more customers.
  • Ceph Manager — The Ceph Manager daemon (ceph-mgr) runs alongside the monitor daemons, to provide additional monitoring and interfaces to external monitoring and management systems (Only available from Luminous version upwards).

1.2 Storage Clients

Ceph provides a block interface (RBD) for each cluster connection , an object interface (RGW) and a file system interface (CephFS). The mostly common interface is RBD .

  • S3 / SWIFT (RGW) — It consumes the services of the Rados gateway via the internet to store objects.
  • RBD. A reliable, fully distributed block device with cloud platform integration
  • CephFS — Ceph Filesystem (CephFS) is a POSIX-compatible file system that uses a Ceph cluster to store your data. To use this feature it is necessary to have a Ceph Metadata Server (MDS) structure in the Ceph cluster
  • LIBRADOS — A library that allows applications to directly access RADOS (C, C ++, Java, Python, Ruby, PHP)

2. Inside of the solution

2.1 Authentication

Ceph uses a cephx authentication system similar to Kerberos to authenticate users and daemons, where SSL or TLS is not used. Cephx shares secret keys for mutual authentication, which means that both the clients and the monitors in the cluster have a copy of the client’s secret key.

2.2 Cluster Map

Ceph maintains all cluster topology, which includes five maps called the “Cluster Map”:

  • Monitor Map: Contains the fsid of the cluster, the position, the name of the address and the port of each monitor. It also indicates the current time, when the map was created and the last time it was changed. To view a map of the monitor, run ceph mon dump.
  • OSD Map: contains the cluster fsid, when the map was last created and modified, a list of pools, replica sizes, PG numbers, a list of OSDs and their status (for example, up, in and down). To view an OSD map, run ceph osd dump .
  • PG Map: Contains the PG version, its time stamp, the last epoch of the OSD map, the complete proportions and details of each positioning group, such as PG ID, Up Set, Active, PG status (for example, active + clean) and data usage statistics for each pool.
  • CRUSH Map: Contains a list of storage devices, the fault domain hierarchy (for example, device, host, rack, line, room, etc.) and rules for going through the hierarchy when storing data.
    You can view the decompiled map in a text editor.
  • MDS Map (CEPHFS): Contains the current time of the MDS map, when the map was created and the last time it was changed. It also contains the pool to store metadata, a list of metadata servers and which metadata servers are active and available. To view an MDS map, run ceph mds dump .
  1. Server Client contacts Monitors to update a copy of the cluster.
  2. The Server Client receives the Map of PGs and OSD’s.
  3. The Server Client sends the data to write to the cluster and the entire Crush process by mapping the PGs to the OSD’s.
  4. The entire recording and replication process is performed (detailed below).

2.3 Logical Data

Ceph uses 3 important components to make the logical separation of data:

  • Pools — Objects in Ceph are stored in Pools. Each pool is divided into pg_num positioning groups (PGs) where each PG contains fragments of the general set object.
  • Placement Groups — Ceph maps objects to placement groups (PGs) that are fragments of a set of objects that are mapped to various OSDs.
  • CRUSH Map — CRUSH is what allows Ceph to scale without performance bottlenecks, without scalability limitations and without a single point of failure. CRUSH maps provide the physical topology of the cluster to the CRUSH algorithm to determine where an object’s data and its replicas should be stored and how to do this in the fault domains for greater data security.

2.4 Writing and Reading Process

The reading and writing process is carried out as follows in the cluster:

  1. The customer writes the object to the PG identified in the main OSD.
  2. In the main OSD with its own copy of the CRUSH map it identifies other OSDs.
  3. Replica is performed for the Second OSD.
  4. Replies are made to the Third OSD.
  5. The Client receives written confirmation successfully.

2.5 Data Replication

The replicated data inside of cluster :

2.5 Physical separation

CRUSH maps contain a list of OSDs, a list of buckets for aggregating devices in physical locations, and a list of rules that tell you how CRUSH should replicate data in Ceph cluster pools.

  • Rack com storages SATA — Pool A e Pool B
  • Rack com storages SSD — Pool C e Pool D
  • <name>: The name of the rule.
  • <root>: The root of the CRUSH hierarchy.
  • <failure-domain>: The failure domain. For example: host or rack.
  • <class>: The storage device class. For example: hdd or ssd. Ceph Luminous and later only.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store