# Kubernetes _____ ### Overview - Open-source platform for managing your containers - Microservices architecture - more agile versus hadoop - Containerized applications allows for consistent environments - Moving between Development and Production environments - Scaling large scale workloads and can leverage frameworks like Spark - Spark creates a spark driver running within a kubernetes on a pod - Workers are deployed in the Kubernetes cluster ### Pods - Group of one or more containers - shared storage/network resources - Instructions for how to run the containers - Pods scheduled on nodes (e.g. virtual machine) - Nodes run node agents called Kublet - All communication happens through the API server ### Volumes - Volume represents a way to store, retrieve, and persist data across pods and through the application lifecycle - Applications often need to be able to store and retrieve data. - **Ephemeral:** - disposable resources, different approaches are available for applications to use and persist data as necessary. Kubernetes volumes can also be used as a way to inject data into a pod for use by the containers. - Secrets/configs/metadata **Persistent Volume:** - Volumes that are defined and created as part of the pod lifecycle only exist until the pod is deleted. ![image](../assets/kubernetes.png) ### MiniKube - Runs single node kubernetes cluster on your local laptop - Use base docker image for Spark - pull image from DockerHub - Make spark distribution tarball - Build the image for Spark - Instead can use Helm - a package manager to kubernetes for configuration - Helm charts for Spark setup (not familiar with) - Assists with building config.yaml file for JupyterHub **Commands:** - **Create pod:** `Kubectl create -f jupyperlab.yaml` - **Create service (DNS entry):** `Kubectl create -f jupyperlab-service.yaml` - **Pod status:** `Kubectl get pods`, `Kubectl get services` - Forward ports to access Jupyter Notebooks - Access Kubernetes UI - Can see status of pods ### Kubeflow - Kubeflow is the Machine Learning toolkit for Kubernetes - **Components**: Notebooks, pipelines, training, and serving - Kubeflow helps makes ML use-cases portable and scalable - Easily deploy workflows to Kubernetes - Train models, Tensorboard, serve model, build pipelines **Pipelines** - An engine for scheduling multi-step ML workflows - SDK to defining and building pipeline components - UI to visualize and track experiments/jobs/runs. ![image](../assets/kubeflow.png)