Understanding the Basics: A Beginner’s Guide to Distributed Storage

September 22, 2019

A Beginner's Guide to Distributed Storage

Many applications require persistent storage.

Pods and containers are typically presumed to be ephemeral, meaning they're short-lived and can be replaced easily. This is fine for stateless apps, but most applications aren't.

A persistent storage solution that can scale and adapt to changing workload needs without forklift upgrades to support them is essential.

Volumes

Volumes offer persistent storage that is independent of containers. It's used for stateful applications that require more than just temporary changes to the file system, such as databases and file servers. These apps can be restored to a previous state, even after multiple container crashes, using the information stored in their volumes.

A pod's volume is mounted similarly to a pod (although separate). The difference is that volumes live outside the pod's file system and can be consumed by any number of containers. The same data is accessible to all the containers using that volume, so a container can fail without disrupting other applications.

A volume can be mounted in three modes: ReadOnlyOne, ReadOnlyMany, and ReadWriteOnce. The mode dictates how the data is accessed and whether multiple containers can write to the same volume. A pod can mount any of the three types of volumes.

In addition to defining a PV's capacity and access modes, it can also offer custom parameters to users based on the nature of their workloads. These are typically things like a specific disk type (HDD vs. SSD), level of performance, or a certain storage tier.

PersistentVolumeClaim (PVC)

A PersistentVolumeClaim describes the storage capacity and characteristics a pod requires, and the cluster uses them for Kubernetes distributed storage. For example, the PVC may include a specific volume size or access mode.

The PVC is paired with a PV by the master using a control loop that watches for new PVs and any matching PVCs and binds them together. The bind creates an exclusive, one-to-one mapping between the PV and PVC. The pod can then use the bound PV as its storage.

Various types of backends can back a PV. The cluster administrator defines the type of backend for a StorageClass. When a PVC requests storage, the cluster matches it with a StorageClass and then provisions the appropriate volume from that StorageClass.

The pod can mount the volume into its container in several access modes, for example, ReadOnlyMany or ReadWriteOncePod. The pod's PVC specifies the access mode by setting the storageClassName field.

A volume's reclaim policy tells the cluster what to do with it once it's released from its PVC, and it can be Retained, Recycled, or Deleted. This enables you to optimize storage for your applications by not keeping volumes around that are no longer in use. The reclaim policy can also be set to prevent PVs from being reclaimed until any pods no longer use them.

Storage Class

A container-based system uses persistent storage mechanisms that allow users to access data even after their containers shut down.

To use this feature, a persistent volume must be created for every pod that needs to store data. A pod can then call the volume to access its data.

However, creating a PersistentVolumeClaim (PVC) is only an option for some workloads because it requires a lot of manual steps.

A StorageClass is an object that defines a storage type with custom parameters to meet the demands of a particular application—the parameters the StorageClass sets are used to create PersistentVolumeClaim objects and then assign them to pods.

The parameters of the StorageClass can be anything from disk size to performance requirements, such as read/write speed and latency. They also include several storage policies, such as reclaim and backup schedules.

The StorageClass API also allows administrators to offer multiple options for each storage type, allowing them to tailor their infrastructure to the specific demands of their workloads. For example, a StorageClass can be offered on a fast SSD or a cheap magnetic drive. Users can also select the preferred reclaim policy for each StorageClass, such as Prompt or Retain.

External Storage

Using standard protocols, Kubernetes supports external storage systems, including public cloud providers, virtualization systems, and on-premise hardware. This allows you to build a highly available, scalable, cost-effective storage solution for your applications and data.

The core abstraction in a storage system is volumes, which represent persistent or non-persistent storage for containers. Volumes can be created manually through static provisioning or dynamically through PersistentVolumeClaims (PVC). The PVC specifies the storage plug-in and the external provider to use, which is then used by the Container Storage Interface (CSI) to create a persistent volume on the device.

Many traditional storage solutions bind to hardware devices, which can be a single point of failure in a containerized environment and cause performance bottlenecks when mounting volumes on workers. In addition, these systems need to be designed with a container-first mindset and can have resource-limit issues.

For example, a Storage Class can be offered on a fast SSD or a cheap magnetic drive. Users can also select the preferred reclaim policy for each StorageClass, such as *click here*__ for Prompt or Retain.