Week 29.1 Kubernates (volumes, PVC's)

In this lecture, Harkirat explores data persistence and scaling in containerized environments. He begins with volumes in Docker, then transitions to Kubernetes, covering both ephemeral and persistent volumes. The lecture delves into static persistent volumes and automated PV creation, demonstrating how to manage stateful applications effectively.

Volumes in Docker

Untitled

Pretext: Sample Application

A Node.js application that periodically writes random data to a file is used as an example. The Docker image for this app is available at 100xdevs/write-random.

Key Points:

Ephemeral Storage in Docker:
- By default, data in Docker containers is ephemeral.
- Data is lost when the container is stopped or removed.
Running the Sample Container:
Terminal window
```
docker run 100xdevs/write-random
```

Inspecting Container Data:

docker exec -it container_id /bin/bash
cat randomData.txt

Volumes in Docker:
- Used to persist data across container stops and starts.
- Two main types: Bind Mounts and Volume Mounts.

Bind Mounts:

Maps a host machine directory to a container directory.

Example:

docker run -v /Users/harkiratsingh/Projects/100x/mount:/usr/src/app/generated 100xdevs/write-random

Volume Mounts:
- Uses Docker-managed volumes for data persistence.
- Creating a volume:
  Terminal window
```
docker volume create hello
```
- Using the volume:
  Terminal window
```
docker run -v hello:/usr/src/app/generated 100xdevs/write-random
```
Data Persistence:
- With both bind mounts and volume mounts, data persists even if the container is stopped.

Key Takeaways:

Docker containers by default use ephemeral storage.
Volumes provide a way to persist data beyond the container’s lifecycle.
Bind mounts directly map host directories to container directories.
Volume mounts use Docker-managed volumes for more portable and flexible data storage.
Both methods ensure data persistence across container restarts or removals.

Understanding volumes is crucial for managing stateful applications in Docker environments, allowing for data persistence and easier management of application state.

Volumes in Kubernetes

Volumes in Kubernetes are directories, potentially containing data, that are accessible to containers within a pod. They serve several important purposes in container orchestration.

Untitled

Why Volumes are Needed

Inter-Container Data Sharing: Allows containers in the same pod to share data and filesystems.
Data Persistence: Enables data to persist even when containers restart, crucial for databases and stateful applications.
Temporary Storage: Provides extra space for execution needs like caching, without requiring persistence.

Untitled

Types of Volumes

Ephemeral Volumes
- Temporary storage shared among containers in a pod
- Lifecycle tied to the pod (destroyed when pod is deleted)
- Examples:
  - ConfigMap
  - Secret
  - emptyDir
Persistent Volumes (PV)
- Storage in the cluster provisioned by an administrator or dynamically via Storage Classes
- Lifecycle independent of any individual pod
- Can use various storage backends (NFS, iSCSI, cloud-provider storage)
Persistent Volume Claims (PVC)
- User requests for storage
- Similar to how pods consume node resources, PVCs consume PV resources
- Can request specific size and access modes

Untitled

Key Concepts

Volumes solve the problem of data persistence in containerized environments
They enable data sharing between containers and across pod restarts
Kubernetes supports a wide variety of volume types to suit different needs
PVs and PVCs provide a way to abstract storage details from pod specifications

Understanding and effectively using volumes is crucial for managing stateful applications and ensuring data persistence in Kubernetes environments.

Ephemeral Volumes

Ephemeral volumes allow containers in the same pod to share data, but the data is lost when the pod is terminated.

Untitled

Setup Example

Create a Manifest (kube.yml):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: shared-volume-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: shared-volume-app
  template:
    metadata:
      labels:
        app: shared-volume-app
    spec:
      containers:
      - name: writer
        image: busybox
        command: ["/bin/sh", "-c", "echo 'Hello from Writer Pod' > /data/hello.txt; sleep 3600"]
        volumeMounts:
        - name: shared-data
          mountPath: /data
      - name: reader
        image: busybox
        command: ["/bin/sh", "-c", "cat /data/hello.txt; sleep 3600"]
        volumeMounts:
        - name: shared-data
          mountPath: /data
      volumes:
      - name: shared-data
        emptyDir: {}

Apply the Manifest:
Terminal window
```
kubectl apply -f kube.yml
```

Verify Data Sharing:

kubectl exec -it shared-volume-deployment-74d67d6567-tcdsl --container reader sh

Key Points

emptyDir Volume:
- Type of ephemeral volume used in this example.
- Created when a Pod is assigned to a node, exists as long as that Pod is running on that node.
- Initially empty.
Data Sharing:
- Both containers (writer and reader) mount the same volume at /data.
- writer container writes to the volume, reader container can read from it.
Lifecycle:
- Data in the volume persists across container restarts within the pod.
- Data is lost when the pod is deleted or rescheduled to a different node.
Use Cases:
- Temporary scratch space.
- Sharing data between containers in a pod.
- Caching.
Performance:
- Typically faster than network-based storage.
- Resides on the node’s filesystem.
Limitations:
- Not suitable for persistent data storage.
- Data is lost if the pod is removed from the node.

Persistent Volumes

Persistent Volumes provide a powerful abstraction for managing storage in Kubernetes. They allow for efficient use of storage resources, enable data persistence across pod lifecycles, and support both static and dynamic provisioning to meet various application needs.

Untitled

Key Concepts

Persistent Volumes (PVs):
- Pieces of storage in the cluster
- Provisioned by an administrator or dynamically using Storage Classes
- Exist independently of any pod
Persistent Volume Claims (PVCs):
- Requests for storage by users
- Pods use PVCs to claim PV resources
Storage Provisioning:
- Static: Admin pre-creates PVs
- Dynamic: PVs are created on-demand using Storage Classes

Visualization

The provided image illustrates:

Kubernetes cluster with nodes for running pods
Persistent Volumes as a separate resource pool
Pods claiming storage from PVs

Key Points

Separation of Concerns:
- PVs separate storage provisioning from pod deployment
- Allows for more flexible and scalable storage management
Lifecycle Independence:
- PVs have a lifecycle independent of pods
- Data persists even when pods are deleted or rescheduled
Storage Classes:
- Define different types of storage (e.g., SSD, HDD)
- Enable dynamic provisioning of PVs
Access Modes:
- ReadWriteOnce (RWO): Read-write by a single node
- ReadOnlyMany (ROX): Read-only by many nodes
- ReadWriteMany (RWX): Read-write by many nodes
Reclaim Policies:
- Retain: Keep data for manual cleanup
- Delete: Delete the volume and its data
- Recycle: Basic scrub (deprecated)

Use Cases

Databases requiring persistent storage
Shared file systems across multiple pods
Long-term data storage in stateful applications

Best Practices

Use Storage Classes for dynamic provisioning where possible
Implement appropriate backup and disaster recovery strategies
Monitor PV usage and capacity
Use appropriate access modes and reclaim policies

Static Persistent Volumes

Setting up NFS Server

NFS Server Setup (on AWS): Note: Ensure port 2049 is open on your machine.

version: '3.7'
services:
  nfs-server:
    image: itsthenetwork/nfs-server-alpine:latest
    container_name: nfs-server
    privileged: true
    environment:
      SHARED_DIRECTORY: /exports
    volumes:
      - ./data:/exports:rw
    ports:
      - "2049:2049"
    restart: unless-stopped

Creating PV and PVC

Create PV and PVC:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  storageClassName: nfs
  nfs:
    path: /exports
    server: 52.66.197.168
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-pvc
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
  storageClassName: nfs

Creating a Pod with PV

Create Pod Using PVC:

apiVersion: v1
kind: Pod
metadata:
  name: mongo-pod
spec:
  containers:
  - name: mongo
    image: mongo:4.4
    command: ["mongod", "--bind_ip_all"]
    ports:
    - containerPort: 27017
    volumeMounts:
    - mountPath: "/data/db"
      name: nfs-volume
  volumes:
  - name: nfs-volume
    persistentVolumeClaim:
      claimName: nfs-pvc

Testing Persistence

Insert Data:

kubectl exec -it mongo-pod -- mongo
use mydb
db.mycollection.insert({ name: "Test", value: "This is a test" })
exit

Delete and Recreate Pod:

kubectl delete pod mongo-pod
kubectl apply -f mongo.yml

Verify Data Persistence:

kubectl exec -it mongo-pod -- mongo
use mydb
db.mycollection.find()

Untitled

Key Points

NFS as PV: NFS is used as an example of a network storage solution for PVs.
Static Provisioning: PV is manually created before the PVC.
PV and PVC Binding: PVC binds to the PV based on storage class and access modes.
Pod Volume Mount: The pod uses the PVC to mount the volume.
Data Persistence: Data persists even when the pod is deleted and recreated.

Best Practices

Ensure proper security measures for NFS server access.
Use appropriate storage classes for different types of applications.
Monitor PV usage and implement proper capacity planning.
Implement backup strategies for data stored in PVs.

Static Persistent Volumes provide a way to use pre-provisioned storage in Kubernetes. This approach is useful when you have existing storage infrastructure or specific storage requirements. It allows for data persistence across pod lifecycles, which is crucial for stateful applications like databases.

Automative PV Creation

Here’s a summary of how to set up automatic Persistent Volume (PV) creation in Kubernetes using Vultr’s block storage.

Create a Persistent Volume Claim (PVC):
- Use the vultr-block-storage-hdd storage class
- Request 40Gi of storage
- Use ReadWriteOnce access mode
Create a Pod that uses the PVC:
- Use a MongoDB container
- Mount the PVC to /data/db in the container
Apply the configurations and verify:
- Check created PV, PVC, and Pod
Test data persistence:
- Insert data into MongoDB
- Delete and recreate the Pod
- Verify that the data persists

Key steps:

Create PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: csi-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 40Gi
  storageClassName: vultr-block-storage-hdd

Create Pod:

apiVersion: v1
kind: Pod
metadata:
  name: mongo-pod
spec:
  containers:
  - name: mongo
    image: mongo:4.4
    command: ["mongod", "--bind_ip_all"]
    ports:
    - containerPort: 27017
    volumeMounts:
    - name: mongo-storage
      mountPath: /data/db
  volumes:
  - name: mongo-storage
    persistentVolumeClaim:
      claimName: csi-pvc

Apply and verify:

kubectl get pv
kubectl get pvc
kubectl get pods

Test data persistence:

kubectl exec -it mongo-pod -- mongo
use mydb
db.mycollection.insert({ name: "Test", value: "This is a test" })
exit

kubectl delete pod mongo-pod
kubectl apply -f mongo.yml

kubectl exec -it mongo-pod -- mongo
use mydb
db.mycollection.find()

Untitled

This process demonstrates automatic PV creation and data persistence in Kubernetes using Vultr’s storage solution.

Horizontal Pod Autoscaler

Horizontal Pod Autoscaler (HPA) in Kubernetes is an important feature for automatically scaling applications based on observed metrics. Here’s an elaboration on the key points:

Untitled

Definition and Purpose:
- HPA automatically adjusts the number of pod replicas in deployments, replica sets, or stateful sets.
- It scales based on observed metrics like CPU utilization or custom metrics.
- The goal is to handle varying loads efficiently by scaling out (adding pods) when demand increases and scaling in (reducing pods) when demand decreases.
Horizontal Scaling:
- This refers to adding more pods to the cluster, rather than increasing resources of existing pods.
- It’s called “horizontal” because it expands the application laterally (more instances) rather than vertically (more resources per instance).
Architecture:
- HPA operates as a control loop in Kubernetes.
- It runs intermittently, typically every 15 seconds, not continuously.
Components:
- cAdvisor: A tool for collecting, aggregating, and exporting resource usage data from containers.
- Metrics Server: A lightweight, in-memory metrics storage system that collects resource usage data (CPU, memory) from kubelets and exposes it via the Kubernetes API.

Installation and Usage:

The Metrics Server can be installed using kubectl:

kubectl apply -f <https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml>

Metrics can be viewed using commands like:

kubectl top pod -n kube-system
kubectl top nodes -n kube-system

HPA Controller Operation:
- The HPA controller sends requests to the API server to fetch metrics.
- A sample request might look like:
  Terminal window
```
GET https://[cluster-url]/apis/metrics.k8s.io/v1beta1/namespaces/default/pods
```

This setup allows Kubernetes to automatically adjust the number of running pods based on actual resource usage or custom metrics, ensuring efficient resource utilization and application performance under varying loads.