Cloud Native Platform Engineer (CNPE) Exam Study Guide
A few weeks ago, I took and passed the Cloud Native Platform Engineer (CNPE) exam and earned this shiny new badge. It was the last exam I needed to complete the Golden Kubestronaut program which essentially means you are crazy enough to sit through all 16 CNCF certification exams 😅 Kidding aside, the CNPE exam is a comprehensive and great way to test your knowledge and skills in platform engineering and tests your ability to operate some of the most popular platform engineering tools within the CNCF ecosystem. No easy feat.
This exam is fairly recent in the CNCF certification lineup, so there aren’t as many resources available compared to more established certifications like the CKA or CKAD. So I wanted to put together this study guide to share how you too can prepare for and pass the CNPE exam. I’ll cover what I was able to find in terms of resources, and what you should focus on when preparing for the exam. That and, I’m pretty sure I’ll need to refer back to this myself if I decide to renew my certification in 2 years 🤗
I’ll spare you the what and why you should take the CNPE exam, as that can always be found on the official CNCF blog website for the exam. Instead, let’s jump right into the how and what you should know to prepare for the exam.
First off take a read through Michał Tomczak’s blog post titled “How I Passed the Cloud Native Platform Engineer (CNPE) Exam and what you can learn from my attempt!”. I found it a valuable resource for me to understand the exam structure and preparation strategies. I liked Michal’s approach to preparation especially his focus on hands-on practice and being able to spin up labs to get reps in on the various tools and technologies.
So what tools and technologies should you be familiar with? I’m sure you’ve seen the CNCF landscape diagram. It’s become a meme at this point because of how overwhelming it can be. Good news: you don’t need to know everything in it. You just need to focus on the tools the Linux Foundation listed on the CNPE’s Important Instructions page:
- Argo CD
- Argo Workflows
- Crossplane
- Flagger
- Flux
- Gatekeeper
- Grafana
- Istio
- Jaeger
- Kyverno
- Linkerd
- Open Policy Agent (OPA)
- OpenCost
- OpenTelemetry (OTel)
- Prometheus
- Tekton
Beyond those tools, platform engineering centers on building self-service platforms for developers. You should be comfortable with git on the command line and have a solid grasp of Kubernetes fundamentals - platform engineering is built on Kubernetes, after all.
If you’ve passed all the other CNCF exams that lead up to CNPE, you should already be in good shape since most of the tools above are covered in those exams.
The best thing you can do is book the exam and set a date. It gives you a deadline to work toward and keeps you motivated. If you’re ready to take the plunge, go ahead and register for the exam.
Now let’s get you prepped. Below is a breakdown of the exam domains and competencies based on the official CNPE Exam Domains and Competencies listed in the CNPE exam curriculum document.
The CNPE exam site says you don’t need “deep” level knowledge of the tools, but I found that you actually do need more than surface-level understanding of the tools 😅 Spend extra time getting hands-on experience and being comfortable with the CLI tools and APIs of each project.
Platform Architecture and Infrastructure - 15%
This domain is all about understanding how to design, provision, and manage the underlying Kubernetes platform that supports workloads. You’ll be expected to know how networking, storage, and compute interact, how to right-size resources to control costs, and ultimately how to enforce multi-tenant isolation.
Applying Platform Architecture Best Practices for Networking, Storage, and Compute
This is pure Kubernetes fundamentals, where your CKA knowledge will be tested. The exam expects you to reason about how platform decisions affect workload reliability and performance.
What to know:
- Workloads: Pods, Deployments, StatefulSets, DaemonSets, Jobs, and CronJobs. Know when to pick each one. StatefulSets for ordered, unique pods (databases). DaemonSets for one-per-node (logging agents, metrics collectors). Deployments for stateless horizontal scaling.
- Networking: Services (ClusterIP, NodePort, LoadBalancer, ExternalName), Gateway API, NetworkPolicies for east-west traffic control, CoreDNS configuration.
- Storage: StorageClasses for dynamic provisioning, PersistentVolumeClaims for persistent data, HostPath vs volume mounts. Know the difference between cluster-scoped PVs and namespaced PVCs.
- Scheduling: Resource requests vs limits (CPU/memory), node affinity and anti-affinity, taints and tolerations, topology spread constraints, priority classes and preemption.
- Scaling: Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), Cluster Autoscaler. Know the triggers for each. For VPA, understand the update modes (Off, Initial, Recreate, InPlaceOrRecreate) and how to set minAllowed/maxAllowed bounds in the resource policy.
- Resource governance: ResourceQuotas (namespace-level caps), LimitRanges (default/min/max per container), Pod Security Standards (formerly Pod Security Policies).
Hands-on practice:
Rather than reproducing standard Kubernetes YAML from memory here, use the official Kubernetes documentation to create and inspect these resources yourself. Success in this exam hinges on knowing how to navigate each project’s documentation quickly. The exam provides links to docs - the difference between passing and failing is often how efficiently you find what you need.
A good exam drill is to fix an app that’s failing:
kubectl describe pod <pod-name> -n webapp
kubectl get events -n webapp --sort-by='.lastTimestamp'
Look for Pending pods (insufficient resources), CrashLoopBackOff (app issues), or ImagePullError (image name or credential issues).
Using Cost Management Solutions for Right-Sizing and Scaling
The core skill here isn’t “using one cost product.” It’s understanding how to identify waste and right-size workloads using metrics, resource requests/limits, and autoscaling behavior. OpenCost is the tool on the exam list for cost visibility, but you should be comfortable with Kubernetes-native reasoning about how overprovisioning, idle nodes, and oversized requests affect cost.
What to know:
- OpenCost consumes Prometheus metrics about node and pod resource usage, allocating costs based on actual consumption, not just requests.
- Over-requesting CPU or memory forces unnecessary node scale-out even when actual usage is low.
- Right-sizing means setting requests close to observed 95th-percentile usage so pods get guaranteed resources without wastage.
- Bin packing (more pods per node via lower requests) reduces node count and cost, but must be balanced against SLOs.
Practice:
Install OpenCost (it’s typically deployed alongside Prometheus):
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace=monitoring \
--create-namespace
helm repo add opencost-charts https://opencost.github.io/opencost-helm-chart
helm install opencost opencost-charts/opencost \
--namespace opencost \
--create-namespace \
--set opencost.prometheus.internal.namespaceName=monitoring \
--set opencost.prometheus.internal.serviceName=prometheus-kube-prometheus-prometheus \
--set opencost.prometheus.internal.port=9090 \
--set opencost.metrics.kubeStateMetrics.emitKsmV1Metrics=false \
--set opencost.metrics.kubeStateMetrics.emitKsmV1MetricsOnly=true
See Managing with Helm for the latest OpenCost install instructions.
Then inspect cost allocations:
kubectl port-forward svc/opencost 9090 -n opencost
Look at the UI to understand which namespaces and workloads are consuming the most resources. Challenge yourself: “Can I reduce this request without harming SLOs?”
Also get familiar with the kubectl cost CLI plugin. It lets you query cost allocations directly from the terminal (e.g., kubectl cost pod --opencost), which is handy when you need quick answers without opening the UI.
Optimizing Multi-Tenancy Resource Usage
Multi-tenancy means multiple teams or workloads sharing a cluster safely. The exam may present this as a platform guardrail problem: “How do you let multiple teams share a cluster without one team starving the others?”
What to know:
- Namespaces are the primary isolation boundary – each team gets their own.
- ResourceQuotas cap total resources a namespace can consume (pods, services, CPU, memory).
- LimitRanges set default/min/max resource values per container.
- NetworkPolicies control which pods can communicate across namespaces.
- Pod Security Standards enforce security posture per namespace (privileged, baseline, restricted).
- RBAC ensures each team can only manage resources in their own namespace.
Practice:
Set up ResourceQuotas, LimitRanges, and NetworkPolicies against a local KIND cluster. The exam expects you to know how to find the right spec fields in each project’s docs — not memorize YAML.
Key behaviors to understand:
- ResourceQuota enforces namespace-level totals. If a Pod’s resources exceed the quota, it is blocked – you’ll see
FailedCreateorForbiddenevents. Usekubectl describe resourcequota <name> -n <namespace>to see usage vs hard limits. Remember: the quota tracks cumulative requests and limits across all pods. - LimitRange fills in default resource requests/limits when a Pod spec omits them. Watch out: if a pod’s requests would exceed the ResourceQuota, the API server rejects it.
- Limits vs Requests: Limits cap what a container can use (OOMKilled/Throttling when hit). Requests are the guaranteed minimum for scheduling. Always set both.
Debugging tip: If pods are Pending after applying LimitRange defaults, the defaults may exceed available cluster resources. Adjust them downward and restart the rollout. Verify with
kubectl get resourcequota -n <namespace>and check NetworkPolicies withkubectl get networkpolicy -n <namespace>.
GitOps and Continuous Delivery - 25%
This is the largest weighted domain. You need to be comfortable wiring up GitOps workflows, CI/CD pipelines, and progressive delivery strategies on a Kubernetes cluster.
Before diving into GitOps controllers, make sure you’re solid on the two main ways Kubernetes manifests get organized and deployed: Kustomize and Helm. Both Argo CD and Flux consume these natively, so the exam assumes you know how they work.
Kustomize uses a base/overlay pattern to manage environment-specific configs without templating. You keep shared resources in a base directory and adjust them through overlays like staging and production. Each overlay has a kustomization.yaml that references the base and applies patches. Key skills: writing strategic merge patches, adding resources to a kustomization, and using kubectl apply -k or kubectl diff -k to preview changes.
Helm packages Kubernetes manifests into reusable charts with a values.yaml for customization. Know the basics: Chart.yaml for metadata and versioning, helm install vs helm upgrade, and how to deploy a chart into a specific namespace. You won’t need to write complex Go templates, but you should be able to edit a chart’s deployment template and bump a chart version.
Practice:
- Work through the Kustomize documentation to create a base with staging and production overlays
- Read the Helm quickstart guide and practice installing, upgrading, and inspecting releases
Implementing GitOps Workflows for Application and Infrastructure Deployment
What to know:
The exam lists both Argo CD and Flux as in-scope tools. Focus on the core concepts they share:
- Git is the source of truth for desired state
- A deployment controller reconciles the cluster to match Git
- Drift detection identifies when live state diverges from Git
- Applications are version-controlled and auditable
Argo CD concepts:
- Application resources point to Git repos with manifest paths
- Sync policies (manual vs auto), pruning, and self-heal
- Health status (Healthy, Progressing, Missing, Degraded)
- Repo structure: one repo per app or a monorepo with overlays
Flux concepts:
- Sources (GitRepository, Bucket), Kustomization or HelmRelease
- Control loop watches Git for changes
- Reconciles cluster state automatically
Exam tip: You may be given a choice between Argo CD and Flux for GitOps questions but you should be comfortable with both. The underlying GitOps principles are the same - once you know one, the other is just looking up different resource types.
Practice:
Install Argo CD following the official quick-start guide. The kubectl apply command for the install manifest is the same on exam day - you’ll find it in the docs.
The key Argo CD resource is the Application - it points to a Git repo, a manifest path, and a cluster destination. Try creating one from scratch using the docs at Application Specification to look up the spec fields.
Deliberately change the live cluster with kubectl edit and watch Argo CD detect and correct the drift.
Building and Configuring CI/CD Pipelines Integrated with Kubernetes
What to know:
Tekton is the Kubernetes-native CI/CD tool. The exam expects familiarity with:
Core CI/CD primitives:
- Task: Reusable unit of work containing one or more steps (containers that run)
- Pipeline: Ordered sequence of Tasks with dependencies
- TaskRun: Instantiation of a single Task
- PipelineRun: Instantiation of a Pipeline with concrete parameter values
- Workspaces: Shared volumes between Tasks (e.g., code checkout to build)
- Params and Results: How to pass data in for input, get data out for passing between Tasks
- ServiceAccounts: Authentication for container registry access
- Steps: Individual container invocations within a Task
Tekton Triggers (event-driven pipelines):
For webhook-triggered pipelines, three components work together:
- EventListener: Listens for incoming events (GitHub webhooks, etc.) and routes them to triggers
- TriggerBinding: Extracts data from the event payload
- TriggerTemplate: Defines the PipelineRun to create with parameter substitution
Practice:
Follow the official installation guide - be sure to install the tkn CLI and be familiar with its commands for creating and inspecting resources. Use the Tekton docs to create a Task, Pipeline, TaskRun, and PipelineRun from scratch.
Working with Argo Workflows for Platform Automation
Argo Workflows is the secondary workflow tool on the exam. Its key use case is orchestrating Kubernetes resources - creating namespaces, deploying resources, running Jobs, and cleaning up - not CI/CD pipelines.
Argo Workflows concepts:
- Workflow: Top-level resource defining entrypoint and templates
- WorkflowTemplate: Reusable templates (namespaced or cluster-scoped)
- ClusterWorkflowTemplate: Cluster-scoped reusable templates
- Templates: Individual steps or DAGs within a workflow
- Steps/DAGs: Sequential steps or directed acyclic graphs for complex workflows
- Resource action: Create, delete, get, apply Kubernetes resources directly from workflows
- ServiceAccount: Authenticates workflows to the Kubernetes API for resource creation
Mental model: “Tekton = CI/CD pipelines, Argo Workflows = orchestration of Kubernetes resources.”
Practice:
Install Argo Workflows from the official docs and practice:
- Creating resources from a workflow using the
resourceaction - look at the Kubernetes Resources walkthrough - Create WorkflowTemplates and submit them using the
argoCLI
Deploying Applications Using Progressive Delivery Strategies
What to know:
Argo Rollouts and Flagger are tools for blue/green and canary deployments on Kubernetes.
Argo Rollouts key concepts:
- Rollout resource replaces Deployment for progressive delivery
- Blue/Green: Route traffic from old to new version all at once after health checks pass
- Canary: Gradually shift traffic to new version with metric-based promotion
- AnalysisTemplates: Define KPI checks (Prometheus queries, HTTP checks)
- AnalysisRuns: Instantiations of analysis templates
- Pause/Resume: Manual gates in the promotion pipeline
- Automatic rollback when metrics indicate failure
Flagger concepts:
- Progressive delivery via service mesh or ingress controller traffic shifting
- Supports canary releases, A/B testing, and blue/green mirroring
- Integrates with Prometheus, InfluxDB, Datadog for metric analysis
- Webhook analysis: Flagger supports pre-rollout and post-rollout webhooks in the analysis spec for running HTTP checks against canary services before promotion
Practice:
Install from the official release page and practice creating a Rollout resource. Key spec fields to look up:
- Setting up canary strategy to control traffic shifting
- Setting up pause steps for manual promotion gates
- Defining AnalysisTemplates for metric-based promotion
Get comfortable with the argo rollouts CLI and web UI to monitor rollout status.
Platform APIs and Self-Service Capabilities - 25%
This domain tests your ability to extend Kubernetes with custom APIs, automate platform operations, and expose self-service capabilities to developers.
Designing and Creating Custom Resource Definitions for Platform Services
What to know:
CRDs extend the Kubernetes API with platform-specific objects. You need to understand the full lifecycle:
- Schema definition with OpenAPI v3 validation
- Namespaced vs cluster-scoped resources
- Versioning and conversion strategies for evolving APIs
- Reading custom resource definitions and deploying custom resources
Practice:
Go through at least the first few chapters of the Kubebuilder book to understand how to scaffold a new API and controller. Focus on the API definition and how the controller reconciles desired vs actual state. Try updating the API version and see how that affects the Custom Resource Definition manifest.
Implementing Workflows for Self-Service Provisioning Using Platform APIs
What to know:
- Custom controllers: Reconcile loop, watches, finalizers, status updates
- CRD-based provisioning: Define a CRD for a platform service (e.g., a Database resource), then write a controller that creates the underlying Kubernetes resources (StatefulSet, Service) when a Database resource is created.
- Self-service APIs: Expose custom resources that developers can create to request platform services without needing cluster admin access.
Practice:
- Follow the Extend the Kubernetes API with CustomResourceDefinitions and Versions in CustomResourceDefinitions tutorials to create a simple CRD and custom resource.
Using Kubernetes Operators for Platform Automation and Integration
What to know:
Operators encode operational knowledge into automation. The exam expects familiarity with:
- Reconcile loop: The controller continuously compares desired state (spec) to actual state (cluster) and makes changes to align them
- controller-runtime: The Go library Kubebuilder is built on
- Watches: Triggering reconciliation on resource changes
- Finalizers: Cleanup logic before a resource is deleted
- Status updates: Reporting observed state back to the API
Practice:
As mentioned above, go through the Kubebuilder tutorial to build a simple operator. Focus on the reconcile logic and how it manages the lifecycle of custom resources.
Using Automation Frameworks for Self-Service Provisioning
What to know:
- Backstage: Developer portal for self-service. Software Templates scaffold new projects. The Software Catalog manages microservices and infrastructure.
- Crossplane: Infrastructure-as-code for Kubernetes. Crossplane lets you define complex infrastructure as custom Kubernetes resources.
- Terraform/OpenTofu: Traditional IaC tools that can be integrated into Kubernetes workflows.
Core Crossplane concepts:
- Composite Resource Definition (XRD): Defines the schema for composite resources – like a CRD that declares what your platform resource looks like (spec fields, status fields, scope)
- Composite Resource (XR): The actual custom resource created by a user – e.g., an App resource that a developer creates to request infrastructure
- Composition: Defines which Kubernetes/cloud resources to create when an XR is created, and how to wire them together
- Composition Revisions: Versioned snapshots of compositions for managing changes safely
- Composition Functions: Plugins that execute the composition logic. Think of them as configuration-language plugins. Options include:
- Templated YAML (KRO - Kubernetes Resource Orchestrator)
- YAML+CEL
- Python
- KCL
- Environment Configs: In-memory data stores for Compositions to share state
- Mental model: “XRD = schema, XR = instance, Composition = recipe.” When you create an XR, Crossplane consults the Composition to know which resources to create, then reconciles continuously.
Practice:
- Follow the Get Started guide to create an XRD, Composition, and XR from scratch. This is the best way to understand the flow.
- Follow the Backstage Software Template guide to create a Software Template that scaffolds a microservice.
Terraform/OpenTofu are not on the list of tools for the exam, but you should understand how they can be integrated into Kubernetes workflows.
Observability and Operations - 20%
A platform is only as good as your ability to observe and debug it. This domain tests your skills in monitoring, metrics, and incident diagnosis.
Implementing Monitoring, Alerting, Logging, and Tracing Solutions
What to know:
Prometheus is the core monitoring tool. You should be comfortable with:
- Scrape targets: How Prometheus discovers and scrapes metrics from Kubernetes endpoints
- PromQL: Querying metrics for alerting and dashboarding
- Alert rules: Defining conditions for firing alerts based on PromQL queries
- Alertmanager: Routing alerts to channels, silencing, grouping
If you’re using the kube-prometheus-stack (and you likely will be), you’ll also encounter Prometheus Operator CRDs like ServiceMonitor, PodMonitor, and PrometheusRule. These let you configure scrape targets and alerting rules as Kubernetes resources instead of editing ConfigMaps directly.
Grafana is the visualization layer:
- Data sources (Prometheus, Loki, Jaeger)
- Dashboard panels and queries
- Linking panels across dashboards
Jaeger for distributed tracing:
- Traces: End-to-end request flows across services
- Spans: Individual operations within a trace (each has a trace ID and span ID)
- Trace context: W3C
traceparentheader carries trace ID and span ID for context propagation - Service-to-service call chains: Visualizing how requests flow between services in Jaeger UI (port 16686)
- Identifying latency hotspots: Find which span takes the longest in a trace
- Span hierarchy: Parent-child relationships between spans show call depth
OpenTelemetry for instrumentation and telemetry collection:
- Pipeline architecture: How telemetry flows from instrumentation to backend via the Collector
- Receivers: Accept telemetry via OTLP (gRPC on port 4317, HTTP on port 4318)
- Processors: Transform data (e.g., batch, filter, attribute manipulation)
- Exporters: Route to backends (Jaeger for traces, Prometheus for metrics, etc.)
Typical Collector configuration:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
exporters:
otlp/jaeger:
endpoint: "jaeger:4317"
tls:
insecure: true
otlphttp/prometheus:
endpoint: "http://prometheus:9090/api/v1/otlp"
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/jaeger]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [otlphttp/prometheus]
logs:
receivers: [otlp]
processors: [batch]
exporters: [debug]
Practice:
- Install the kube-prometheus-stack for Prometheus and Grafana
- Install Jaeger via the OpenTelemetry Operator or Helm Chart
Measuring and Improving Platform Efficiency Using Deployment Metrics and Performance Indicators
What to know:
The DORA metrics are the industry standard for measuring software delivery performance. You don’t need a tool to calculate them - the exam may test your understanding of how platform decisions connect to these outcomes:
- Deployment Frequency: How often code is deployed to production. GitOps + automation = higher frequency.
- Lead Time for Changes: Time from code commit to production. Shorter is better. CI/CD pipeline speed matters here.
- Change Failure Rate: Percentage of deployments causing incidents. Progressive delivery (canary/blue-green) reduces this.
- Time to Restore Service: How fast you recover from failures. Automated rollback, good monitoring, and clear runbooks reduce this.
Practice:
Track these metrics for your platform. Build one small example of each improvement:
- Automate a manual step to improve deployment frequency
- Add parallel pipeline stages to reduce lead time
- Add a canary rollout to reduce change failure rate
- Add automated rollback to reduce time to restore
Key insight: A good platform makes delivery faster while keeping incidents lower. These metrics are feedback loops, not just report cards.
Diagnosing and Remediating Platform Issue and Incident Scenarios
What to know:
The exam will present symptoms and expect you to narrow down the fault domain. Expect troubleshooting across app, cluster, and platform layers (similar to CKAD/CKA/CKS exams but with more platform tools in the mix):
- Unhealthy pods, misrouted traffic, broken GitOps sync, failed rollouts
- Missing alerts, RBAC denial, policy blocks, image pull failures
Structured approach:
- Inspect status:
kubectl get,kubectl describe - Check events:
kubectl get events --sort-by='.lastTimestamp' - Review logs:
kubectl logs <pod> --tail=100 - Query metrics: PromQL for resource usage, error rates
- Verify controller state: Argo CD sync status, Rollout status, policy reports
Practice:
Intentionally break things and fix them:
# Make a pod crash
kubectl set image deployment/webapp webapp=nginx:invalid -n webapp
kubectl describe pod <pod> -n webapp # Look for ImagePullError
kubectl logs <pod> -n webapp --previous # Previous container logs
# Make a rollout fail
kubectl rollout pause deployment/webapp
kubectl rollout undo deployment/webapp
# Check RBAC
kubectl auth can-i create deployments --namespace webapp --as=system:serviceaccount:default:my-sa
Security and Policy Enforcement - 15%
This domain covers service-to-service security, RBAC, policy enforcement, and security scanning. You should know how to wire these into a working platform.
Configuring Secure Service-to-Service Communication
What to know:
Istio and Linkerd are the two service meshes you may encounter in the exam. Both provide Gateway API support for north-south traffic and mTLS for east-west traffic. Key concepts:
- mTLS: Mutual TLS for service identity and encryption. Both sides verify each other.
- Authentication policies: Define which services require mTLS and which can accept plaintext
- Authorization policies: Define which services can talk to which (e.g., only allow frontend to call backend)
- Gateway API: For exposing services outside the cluster with TLS termination
Practice:
- Follow the Istio Gateway API tutorial then the Argo Rollouts integration with Istio docs.
- Follow the Linkerd Getting Started guide then explore the Gateway API support and Traffic Splitting docs.
Applying RBAC and Security Controls Across Platform Resources
What to know:
Kubernetes RBAC is foundational. You should be comfortable with:
- Roles/ClusterRoles: Defines what actions are allowed (verbs: get, list, create, delete, etc.) on which resources
- RoleBindings/ClusterRoleBindings: Grants roles to users, groups, or ServiceAccounts
- ServiceAccounts: Identity for pods and automation. Each namespace gets a default SA.
- Least privilege: Only grant what’s needed. Read-only roles for developers, admin for platform teams.
Practice:
kubectl create role log-reader --resource=pods --verb=get,list --namespace webapp
kubectl create rolebinding log-reader-binding --role=log-reader --user=developer --namespace webapp
# Create a ServiceAccount for automation
kubectl create serviceaccount my-operator-sa -n platform
kubectl create clusterrolebinding operator-binding --clusterrole=cluster-admin --serviceaccount=platform:my-operator-sa
Verify access:
kubectl auth can-i create deployments --namespace webapp --as=system:serviceaccount:platform:my-operator-sa
Generating Audit Trails and Enforcing Policy Compliance (SBOM, Compliance Reports, etc.)
What to know:
Kubernetes Pod Security Standards is the built-in mechanism for enforcing security policies on pods. You should know the three levels (Privileged, Baseline, Restricted) and how to apply them via Namespace labels or Admission Controllers.
Trivy handles security scanning and SBOM generation:
- Image scanning: Find vulnerabilities in container images
- SBOM generation: Generate Software Bill of Materials in CycloneDX or SPDX format
- Kubernetes scanning: Scan live Kubernetes resources for vulnerabilities, misconfigurations, and secrets
- Filesystem scanning: Scan local files for vulnerabilities and misconfigurations
SBOM concepts:
- An SBOM (Software Bill of Materials) is an inventory of all software components in a build
- CycloneDX and SPDX are the two standard formats
- Auditors use SBOMs to verify which dependencies are running and whether any have known vulnerabilities
Practice:
- Follow the First steps with Trivy doc to install and scan a container image.
Using Policy Engines and Admission Controllers for Governance
What to know:
Kyverno uses stable policy types since v1 (as of Kyverno 1.12+). Know the difference between stable and legacy formats:
Stable policy types (use these on the exam):
- ValidatingPolicy: Validates resources against rules using CEL expressions
- MutatingPolicy: Automatically modify resources on creation
- GeneratingPolicy: Create additional resources when a resource is created
- DeletingPolicy: Automatically delete resources on a schedule (e.g., cleanup completed job pods)
- ImageValidatingPolicy: Verify container images against signatures (Cosign, Notary)
Legacy format (may still appear in existing configs):
- The typed rules format (
validate,mutate,generate,delete) in the olderClusterPolicykind is deprecated. The new format uses separate kinds for each policy type with CEL-based expressions.
OPA/Gatekeeper:
Gatekeeper uses a two-step process: ConstraintTemplate defines the policy logic (Rego) and schema, then Constraint applies it to specific resources.
- ConstraintTemplate: Defines both the Rego enforcement logic and the parameter schema. The schema allows fine-tuning like function arguments.
- Constraint: Instantiates a template with specific parameters and target selectors (namespaces, resource kinds).
- Rego: Declarative policy language for violation logic.
- Libraries: Rego packages can be shared across templates via the
libssection in ConstraintTemplate.
Practice:
- Follow the Kyverno installation guide then go through the Policy types documentation to create a ValidatingPolicy and MutatingPolicy.
- Follow the Gatekeeper installation guide then go through some of the examples found in the Gatekeeper Library.
Integrating Security Scanning and Compliance Checks into Deployment Pipelines
What to know:
The key concept is to move scanning left without breaking delivery:
- Scan images in CI before they’re pushed
- Scan manifests and Kubernetes resources before promotion
- Fail the pipeline on critical findings
- Generate SBOMs as part of the build artifact
Practice:
This practice is a bit open-ended but try creating a Tekton Task or other workflow that incorporates tools mentioned above. For example, a Task that scans an image with Trivy and generates an SBOM:
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: security-scan
spec:
params:
- name: image
type: string
steps:
- name: scan-image
image: ghcr.io/aquasecurity/trivy:latest
script: |
trivy image --exit-code 1 --severity HIGH,CRITICAL $(params.image)
- name: generate-sbom
image: ghcr.io/aquasecurity/trivy:latest
script: |
trivy image --format spdx-json --output /tmp/sbom.spdx.json $(params.image)
Conclusion
Lot of material to cover, right? Don’t worry about memorizing everything, as mentioned above, you will have access to documentation during the exam. The key is to be comfortable navigating the docs and understanding the core concepts so you can quickly find what you need.
The CNPE exam is performance-based - you’ll execute real tasks in a Linux desktop environment for 120 minutes. The best preparation is hands-on practice with each tool. When I sit through performance-based exams, I typically spend the first few minutes reading through all the tasks to get a sense of what’s being asked, then I prioritize based on my strengths. If you can answer something quickly, do it and secure those points. For tougher tasks, read the question carefully, then start to systematically work through the problem. If you find yourself stuck in a troubleshooting loop, flag the task and move on - you don’t have time to get bogged down. Go work on another task, then come back with fresh eyes.
Hope this guide helps! The CNPE is challenging but fair - if you’ve built platform engineering systems in real life, the hands-on tasks will feel natural. Take your time, read each task carefully, and use the documentation links provided for you on each task.
Good luck, and let me know here or on LinkedIn if you have any questions or want to share your experience with the exam!
Peace ✌️
