Cloud Native Platform Engineer (CNPE) Exam Study Guide

2026-05-13 23 min read Certification

A few weeks ago, I took and passed the Cloud Native Platform Engineer (CNPE) exam and earned this shiny new badge. It was the last exam I needed to complete the Golden Kubestronaut program which essentially means you are crazy enough to sit through all 16 CNCF certification exams 😅 Kidding aside, the CNPE exam is a comprehensive and great way to test your knowledge and skills in platform engineering and tests your ability to operate some of the most popular platform engineering tools within the CNCF ecosystem. No easy feat.

This exam is fairly recent in the CNCF certification lineup, so there aren’t as many resources available compared to more established certifications like the CKA or CKAD. So I wanted to put together this study guide to share how you too can prepare for and pass the CNPE exam. I’ll cover what I was able to find in terms of resources, and what you should focus on when preparing for the exam. That and, I’m pretty sure I’ll need to refer back to this myself if I decide to renew my certification in 2 years 🤗

I’ll spare you the what and why you should take the CNPE exam, as that can always be found on the official CNCF blog website for the exam. Instead, let’s jump right into the how and what you should know to prepare for the exam.

First off take a read through Michał Tomczak’s blog post titled “How I Passed the Cloud Native Platform Engineer (CNPE) Exam and what you can learn from my attempt!”. I found it a valuable resource for me to understand the exam structure and preparation strategies. I liked Michal’s approach to preparation especially his focus on hands-on practice and being able to spin up labs to get reps in on the various tools and technologies.

So what tools and technologies should you be familiar with? I’m sure you’ve seen the CNCF landscape diagram. It’s become a meme at this point because of how overwhelming it can be. Good news: you don’t need to know everything in it. You just need to focus on the tools the Linux Foundation listed on the CNPE’s Important Instructions page:

Beyond those tools, platform engineering centers on building self-service platforms for developers. You should be comfortable with git on the command line and have a solid grasp of Kubernetes fundamentals - platform engineering is built on Kubernetes, after all.

If you’ve passed all the other CNCF exams that lead up to CNPE, you should already be in good shape since most of the tools above are covered in those exams.

The best thing you can do is book the exam and set a date. It gives you a deadline to work toward and keeps you motivated. If you’re ready to take the plunge, go ahead and register for the exam.

Now let’s get you prepped. Below is a breakdown of the exam domains and competencies based on the official CNPE Exam Domains and Competencies listed in the CNPE exam curriculum document.

The CNPE exam site says you don’t need “deep” level knowledge of the tools, but I found that you actually do need more than surface-level understanding of the tools 😅 Spend extra time getting hands-on experience and being comfortable with the CLI tools and APIs of each project.

Platform Architecture and Infrastructure - 15%

This domain is all about understanding how to design, provision, and manage the underlying Kubernetes platform that supports workloads. You’ll be expected to know how networking, storage, and compute interact, how to right-size resources to control costs, and ultimately how to enforce multi-tenant isolation.

Applying Platform Architecture Best Practices for Networking, Storage, and Compute

This is pure Kubernetes fundamentals, where your CKA knowledge will be tested. The exam expects you to reason about how platform decisions affect workload reliability and performance.

What to know:

Workloads: Pods, Deployments, StatefulSets, DaemonSets, Jobs, and CronJobs. Know when to pick each one. StatefulSets for ordered, unique pods (databases). DaemonSets for one-per-node (logging agents, metrics collectors). Deployments for stateless horizontal scaling.
Networking: Services (ClusterIP, NodePort, LoadBalancer, ExternalName), Gateway API, NetworkPolicies for east-west traffic control, CoreDNS configuration.
Storage: StorageClasses for dynamic provisioning, PersistentVolumeClaims for persistent data, HostPath vs volume mounts. Know the difference between cluster-scoped PVs and namespaced PVCs.
Scheduling: Resource requests vs limits (CPU/memory), node affinity and anti-affinity, taints and tolerations, topology spread constraints, priority classes and preemption.
Scaling: Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), Cluster Autoscaler. Know the triggers for each. For VPA, understand the update modes (Off, Initial, Recreate, InPlaceOrRecreate) and how to set minAllowed/maxAllowed bounds in the resource policy.
Resource governance: ResourceQuotas (namespace-level caps), LimitRanges (default/min/max per container), Pod Security Standards (formerly Pod Security Policies).

Hands-on practice:

Rather than reproducing standard Kubernetes YAML from memory here, use the official Kubernetes documentation to create and inspect these resources yourself. Success in this exam hinges on knowing how to navigate each project’s documentation quickly. The exam provides links to docs - the difference between passing and failing is often how efficiently you find what you need.

A good exam drill is to fix an app that’s failing:

kubectl describe pod <pod-name> -n webapp
kubectl get events -n webapp --sort-by='.lastTimestamp'

Look for Pending pods (insufficient resources), CrashLoopBackOff (app issues), or ImagePullError (image name or credential issues).

Using Cost Management Solutions for Right-Sizing and Scaling

The core skill here isn’t “using one cost product.” It’s understanding how to identify waste and right-size workloads using metrics, resource requests/limits, and autoscaling behavior. OpenCost is the tool on the exam list for cost visibility, but you should be comfortable with Kubernetes-native reasoning about how overprovisioning, idle nodes, and oversized requests affect cost.

What to know:

OpenCost consumes Prometheus metrics about node and pod resource usage, allocating costs based on actual consumption, not just requests.
Over-requesting CPU or memory forces unnecessary node scale-out even when actual usage is low.
Right-sizing means setting requests close to observed 95th-percentile usage so pods get guaranteed resources without wastage.
Bin packing (more pods per node via lower requests) reduces node count and cost, but must be balanced against SLOs.

Practice:

Install OpenCost (it’s typically deployed alongside Prometheus):

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace=monitoring \
  --create-namespace

helm repo add opencost-charts https://opencost.github.io/opencost-helm-chart
helm install opencost opencost-charts/opencost \
  --namespace opencost \
  --create-namespace \
  --set opencost.prometheus.internal.namespaceName=monitoring \
  --set opencost.prometheus.internal.serviceName=prometheus-kube-prometheus-prometheus \
  --set opencost.prometheus.internal.port=9090 \
  --set opencost.metrics.kubeStateMetrics.emitKsmV1Metrics=false \
  --set opencost.metrics.kubeStateMetrics.emitKsmV1MetricsOnly=true

See Managing with Helm for the latest OpenCost install instructions.

Then inspect cost allocations:

kubectl port-forward svc/opencost 9090 -n opencost

Look at the UI to understand which namespaces and workloads are consuming the most resources. Challenge yourself: “Can I reduce this request without harming SLOs?”

Also get familiar with the kubectl cost CLI plugin. It lets you query cost allocations directly from the terminal (e.g., kubectl cost pod --opencost), which is handy when you need quick answers without opening the UI.

Optimizing Multi-Tenancy Resource Usage

Multi-tenancy means multiple teams or workloads sharing a cluster safely. The exam may present this as a platform guardrail problem: “How do you let multiple teams share a cluster without one team starving the others?”

What to know:

Namespaces are the primary isolation boundary – each team gets their own.
ResourceQuotas cap total resources a namespace can consume (pods, services, CPU, memory).
LimitRanges set default/min/max resource values per container.
NetworkPolicies control which pods can communicate across namespaces.
Pod Security Standards enforce security posture per namespace (privileged, baseline, restricted).
RBAC ensures each team can only manage resources in their own namespace.

Practice:

Set up ResourceQuotas, LimitRanges, and NetworkPolicies against a local KIND cluster. The exam expects you to know how to find the right spec fields in each project’s docs — not memorize YAML.

Key behaviors to understand:

ResourceQuota enforces namespace-level totals. If a Pod’s resources exceed the quota, it is blocked – you’ll see FailedCreate or Forbidden events. Use kubectl describe resourcequota <name> -n <namespace> to see usage vs hard limits. Remember: the quota tracks cumulative requests and limits across all pods.
LimitRange fills in default resource requests/limits when a Pod spec omits them. Watch out: if a pod’s requests would exceed the ResourceQuota, the API server rejects it.
Limits vs Requests: Limits cap what a container can use (OOMKilled/Throttling when hit). Requests are the guaranteed minimum for scheduling. Always set both.

Debugging tip: If pods are Pending after applying LimitRange defaults, the defaults may exceed available cluster resources. Adjust them downward and restart the rollout. Verify with kubectl get resourcequota -n <namespace> and check NetworkPolicies with kubectl get networkpolicy -n <namespace>.

GitOps and Continuous Delivery - 25%

This is the largest weighted domain. You need to be comfortable wiring up GitOps workflows, CI/CD pipelines, and progressive delivery strategies on a Kubernetes cluster.

Before diving into GitOps controllers, make sure you’re solid on the two main ways Kubernetes manifests get organized and deployed: Kustomize and Helm. Both Argo CD and Flux consume these natively, so the exam assumes you know how they work.

Kustomize uses a base/overlay pattern to manage environment-specific configs without templating. You keep shared resources in a base directory and adjust them through overlays like staging and production. Each overlay has a kustomization.yaml that references the base and applies patches. Key skills: writing strategic merge patches, adding resources to a kustomization, and using kubectl apply -k or kubectl diff -k to preview changes.

Helm packages Kubernetes manifests into reusable charts with a values.yaml for customization. Know the basics: Chart.yaml for metadata and versioning, helm install vs helm upgrade, and how to deploy a chart into a specific namespace. You won’t need to write complex Go templates, but you should be able to edit a chart’s deployment template and bump a chart version.

Practice:

Work through the Kustomize documentation to create a base with staging and production overlays
Read the Helm quickstart guide and practice installing, upgrading, and inspecting releases

Implementing GitOps Workflows for Application and Infrastructure Deployment

What to know:

The exam lists both Argo CD and Flux as in-scope tools. Focus on the core concepts they share:

Git is the source of truth for desired state
A deployment controller reconciles the cluster to match Git
Drift detection identifies when live state diverges from Git
Applications are version-controlled and auditable

Argo CD concepts:

Application resources point to Git repos with manifest paths
Sync policies (manual vs auto), pruning, and self-heal
Health status (Healthy, Progressing, Missing, Degraded)
Repo structure: one repo per app or a monorepo with overlays

Flux concepts:

Sources (GitRepository, Bucket), Kustomization or HelmRelease
Control loop watches Git for changes
Reconciles cluster state automatically

Exam tip: You may be given a choice between Argo CD and Flux for GitOps questions but you should be comfortable with both. The underlying GitOps principles are the same - once you know one, the other is just looking up different resource types.

Practice:

Install Argo CD following the official quick-start guide. The kubectl apply command for the install manifest is the same on exam day - you’ll find it in the docs.

The key Argo CD resource is the Application - it points to a Git repo, a manifest path, and a cluster destination. Try creating one from scratch using the docs at Application Specification to look up the spec fields.

Deliberately change the live cluster with kubectl edit and watch Argo CD detect and correct the drift.

Building and Configuring CI/CD Pipelines Integrated with Kubernetes

What to know:

Tekton is the Kubernetes-native CI/CD tool. The exam expects familiarity with:

Core CI/CD primitives:

Task: Reusable unit of work containing one or more steps (containers that run)
Pipeline: Ordered sequence of Tasks with dependencies
TaskRun: Instantiation of a single Task
PipelineRun: Instantiation of a Pipeline with concrete parameter values
Workspaces: Shared volumes between Tasks (e.g., code checkout to build)
Params and Results: How to pass data in for input, get data out for passing between Tasks
ServiceAccounts: Authentication for container registry access
Steps: Individual container invocations within a Task

Tekton Triggers (event-driven pipelines):

For webhook-triggered pipelines, three components work together:

EventListener: Listens for incoming events (GitHub webhooks, etc.) and routes them to triggers
TriggerBinding: Extracts data from the event payload
TriggerTemplate: Defines the PipelineRun to create with parameter substitution

Practice:

Follow the official installation guide - be sure to install the tkn CLI and be familiar with its commands for creating and inspecting resources. Use the Tekton docs to create a Task, Pipeline, TaskRun, and PipelineRun from scratch.

Working with Argo Workflows for Platform Automation

Argo Workflows is the secondary workflow tool on the exam. Its key use case is orchestrating Kubernetes resources - creating namespaces, deploying resources, running Jobs, and cleaning up - not CI/CD pipelines.

Argo Workflows concepts:

Workflow: Top-level resource defining entrypoint and templates
WorkflowTemplate: Reusable templates (namespaced or cluster-scoped)
ClusterWorkflowTemplate: Cluster-scoped reusable templates
Templates: Individual steps or DAGs within a workflow
Steps/DAGs: Sequential steps or directed acyclic graphs for complex workflows
Resource action: Create, delete, get, apply Kubernetes resources directly from workflows
ServiceAccount: Authenticates workflows to the Kubernetes API for resource creation

Mental model: “Tekton = CI/CD pipelines, Argo Workflows = orchestration of Kubernetes resources.”

Practice:

Install Argo Workflows from the official docs and practice:

Creating resources from a workflow using the resource action - look at the Kubernetes Resources walkthrough
Create WorkflowTemplates and submit them using the argo CLI

Deploying Applications Using Progressive Delivery Strategies

What to know:

Argo Rollouts and Flagger are tools for blue/green and canary deployments on Kubernetes.

Argo Rollouts key concepts:

Rollout resource replaces Deployment for progressive delivery
Blue/Green: Route traffic from old to new version all at once after health checks pass
Canary: Gradually shift traffic to new version with metric-based promotion
AnalysisTemplates: Define KPI checks (Prometheus queries, HTTP checks)
AnalysisRuns: Instantiations of analysis templates
Pause/Resume: Manual gates in the promotion pipeline
Automatic rollback when metrics indicate failure

Flagger concepts:

Progressive delivery via service mesh or ingress controller traffic shifting
Supports canary releases, A/B testing, and blue/green mirroring
Integrates with Prometheus, InfluxDB, Datadog for metric analysis
Webhook analysis: Flagger supports pre-rollout and post-rollout webhooks in the analysis spec for running HTTP checks against canary services before promotion

Practice:

Install from the official release page and practice creating a Rollout resource. Key spec fields to look up:

Setting up canary strategy to control traffic shifting
Setting up pause steps for manual promotion gates
Defining AnalysisTemplates for metric-based promotion

Get comfortable with the argo rollouts CLI and web UI to monitor rollout status.

Platform APIs and Self-Service Capabilities - 25%

This domain tests your ability to extend Kubernetes with custom APIs, automate platform operations, and expose self-service capabilities to developers.

Designing and Creating Custom Resource Definitions for Platform Services

What to know:

CRDs extend the Kubernetes API with platform-specific objects. You need to understand the full lifecycle:

Schema definition with OpenAPI v3 validation
Namespaced vs cluster-scoped resources
Versioning and conversion strategies for evolving APIs
Reading custom resource definitions and deploying custom resources

Practice:

Go through at least the first few chapters of the Kubebuilder book to understand how to scaffold a new API and controller. Focus on the API definition and how the controller reconciles desired vs actual state. Try updating the API version and see how that affects the Custom Resource Definition manifest.

Implementing Workflows for Self-Service Provisioning Using Platform APIs

What to know:

Custom controllers: Reconcile loop, watches, finalizers, status updates
CRD-based provisioning: Define a CRD for a platform service (e.g., a Database resource), then write a controller that creates the underlying Kubernetes resources (StatefulSet, Service) when a Database resource is created.
Self-service APIs: Expose custom resources that developers can create to request platform services without needing cluster admin access.

Practice:

Follow the Extend the Kubernetes API with CustomResourceDefinitions and Versions in CustomResourceDefinitions tutorials to create a simple CRD and custom resource.

Using Kubernetes Operators for Platform Automation and Integration

What to know:

Operators encode operational knowledge into automation. The exam expects familiarity with:

Reconcile loop: The controller continuously compares desired state (spec) to actual state (cluster) and makes changes to align them
controller-runtime: The Go library Kubebuilder is built on
Watches: Triggering reconciliation on resource changes
Finalizers: Cleanup logic before a resource is deleted
Status updates: Reporting observed state back to the API

Practice:

As mentioned above, go through the Kubebuilder tutorial to build a simple operator. Focus on the reconcile logic and how it manages the lifecycle of custom resources.

Using Automation Frameworks for Self-Service Provisioning

What to know:

Backstage: Developer portal for self-service. Software Templates scaffold new projects. The Software Catalog manages microservices and infrastructure.
Crossplane: Infrastructure-as-code for Kubernetes. Crossplane lets you define complex infrastructure as custom Kubernetes resources.
Terraform/OpenTofu: Traditional IaC tools that can be integrated into Kubernetes workflows.

Core Crossplane concepts:

Composite Resource Definition (XRD): Defines the schema for composite resources – like a CRD that declares what your platform resource looks like (spec fields, status fields, scope)
Composite Resource (XR): The actual custom resource created by a user – e.g., an App resource that a developer creates to request infrastructure
Composition: Defines which Kubernetes/cloud resources to create when an XR is created, and how to wire them together
Composition Revisions: Versioned snapshots of compositions for managing changes safely
Composition Functions: Plugins that execute the composition logic. Think of them as configuration-language plugins. Options include:
- Templated YAML (KRO - Kubernetes Resource Orchestrator)
- YAML+CEL
- Python
- KCL
Environment Configs: In-memory data stores for Compositions to share state
Mental model: “XRD = schema, XR = instance, Composition = recipe.” When you create an XR, Crossplane consults the Composition to know which resources to create, then reconciles continuously.

Practice:

Follow the Get Started guide to create an XRD, Composition, and XR from scratch. This is the best way to understand the flow.
Follow the Backstage Software Template guide to create a Software Template that scaffolds a microservice.

Terraform/OpenTofu are not on the list of tools for the exam, but you should understand how they can be integrated into Kubernetes workflows.

Observability and Operations - 20%

A platform is only as good as your ability to observe and debug it. This domain tests your skills in monitoring, metrics, and incident diagnosis.

Implementing Monitoring, Alerting, Logging, and Tracing Solutions

What to know:

Prometheus is the core monitoring tool. You should be comfortable with:

Scrape targets: How Prometheus discovers and scrapes metrics from Kubernetes endpoints
PromQL: Querying metrics for alerting and dashboarding
Alert rules: Defining conditions for firing alerts based on PromQL queries
Alertmanager: Routing alerts to channels, silencing, grouping

If you’re using the kube-prometheus-stack (and you likely will be), you’ll also encounter Prometheus Operator CRDs like ServiceMonitor, PodMonitor, and PrometheusRule. These let you configure scrape targets and alerting rules as Kubernetes resources instead of editing ConfigMaps directly.

Grafana is the visualization layer:

Data sources (Prometheus, Loki, Jaeger)
Dashboard panels and queries
Linking panels across dashboards

Jaeger for distributed tracing:

Traces: End-to-end request flows across services
Spans: Individual operations within a trace (each has a trace ID and span ID)
Trace context: W3C traceparent header carries trace ID and span ID for context propagation
Service-to-service call chains: Visualizing how requests flow between services in Jaeger UI (port 16686)
Identifying latency hotspots: Find which span takes the longest in a trace
Span hierarchy: Parent-child relationships between spans show call depth

OpenTelemetry for instrumentation and telemetry collection:

Pipeline architecture: How telemetry flows from instrumentation to backend via the Collector
Receivers: Accept telemetry via OTLP (gRPC on port 4317, HTTP on port 4318)
Processors: Transform data (e.g., batch, filter, attribute manipulation)
Exporters: Route to backends (Jaeger for traces, Prometheus for metrics, etc.)

Typical Collector configuration:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s

exporters:
  otlp/jaeger:
    endpoint: "jaeger:4317"
    tls:
      insecure: true
  otlphttp/prometheus:
    endpoint: "http://prometheus:9090/api/v1/otlp"
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/jaeger]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp/prometheus]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]

Practice:

Install the kube-prometheus-stack for Prometheus and Grafana
Install Jaeger via the OpenTelemetry Operator or Helm Chart

Measuring and Improving Platform Efficiency Using Deployment Metrics and Performance Indicators

What to know:

The DORA metrics are the industry standard for measuring software delivery performance. You don’t need a tool to calculate them - the exam may test your understanding of how platform decisions connect to these outcomes:

Deployment Frequency: How often code is deployed to production. GitOps + automation = higher frequency.
Lead Time for Changes: Time from code commit to production. Shorter is better. CI/CD pipeline speed matters here.
Change Failure Rate: Percentage of deployments causing incidents. Progressive delivery (canary/blue-green) reduces this.
Time to Restore Service: How fast you recover from failures. Automated rollback, good monitoring, and clear runbooks reduce this.

Practice:

Track these metrics for your platform. Build one small example of each improvement:

Automate a manual step to improve deployment frequency
Add parallel pipeline stages to reduce lead time
Add a canary rollout to reduce change failure rate
Add automated rollback to reduce time to restore

Key insight: A good platform makes delivery faster while keeping incidents lower. These metrics are feedback loops, not just report cards.

Diagnosing and Remediating Platform Issue and Incident Scenarios

What to know:

The exam will present symptoms and expect you to narrow down the fault domain. Expect troubleshooting across app, cluster, and platform layers (similar to CKAD/CKA/CKS exams but with more platform tools in the mix):

Unhealthy pods, misrouted traffic, broken GitOps sync, failed rollouts
Missing alerts, RBAC denial, policy blocks, image pull failures

Structured approach:

Inspect status: kubectl get, kubectl describe
Check events: kubectl get events --sort-by='.lastTimestamp'
Review logs: kubectl logs <pod> --tail=100
Query metrics: PromQL for resource usage, error rates
Verify controller state: Argo CD sync status, Rollout status, policy reports

Practice:

Intentionally break things and fix them:

# Make a pod crash
kubectl set image deployment/webapp webapp=nginx:invalid -n webapp
kubectl describe pod <pod> -n webapp   # Look for ImagePullError
kubectl logs <pod> -n webapp --previous # Previous container logs

# Make a rollout fail
kubectl rollout pause deployment/webapp
kubectl rollout undo deployment/webapp

# Check RBAC
kubectl auth can-i create deployments --namespace webapp --as=system:serviceaccount:default:my-sa

Security and Policy Enforcement - 15%

This domain covers service-to-service security, RBAC, policy enforcement, and security scanning. You should know how to wire these into a working platform.

Configuring Secure Service-to-Service Communication

What to know:

Istio and Linkerd are the two service meshes you may encounter in the exam. Both provide Gateway API support for north-south traffic and mTLS for east-west traffic. Key concepts:

mTLS: Mutual TLS for service identity and encryption. Both sides verify each other.
Authentication policies: Define which services require mTLS and which can accept plaintext
Authorization policies: Define which services can talk to which (e.g., only allow frontend to call backend)
Gateway API: For exposing services outside the cluster with TLS termination

Practice:

Follow the Istio Gateway API tutorial then the Argo Rollouts integration with Istio docs.
Follow the Linkerd Getting Started guide then explore the Gateway API support and Traffic Splitting docs.

Applying RBAC and Security Controls Across Platform Resources

What to know:

Kubernetes RBAC is foundational. You should be comfortable with:

Roles/ClusterRoles: Defines what actions are allowed (verbs: get, list, create, delete, etc.) on which resources
RoleBindings/ClusterRoleBindings: Grants roles to users, groups, or ServiceAccounts
ServiceAccounts: Identity for pods and automation. Each namespace gets a default SA.
Least privilege: Only grant what’s needed. Read-only roles for developers, admin for platform teams.

Practice:

kubectl create role log-reader --resource=pods --verb=get,list --namespace webapp
kubectl create rolebinding log-reader-binding --role=log-reader --user=developer --namespace webapp

# Create a ServiceAccount for automation
kubectl create serviceaccount my-operator-sa -n platform
kubectl create clusterrolebinding operator-binding --clusterrole=cluster-admin --serviceaccount=platform:my-operator-sa

Verify access:

kubectl auth can-i create deployments --namespace webapp --as=system:serviceaccount:platform:my-operator-sa

Generating Audit Trails and Enforcing Policy Compliance (SBOM, Compliance Reports, etc.)

What to know:

Kubernetes Pod Security Standards is the built-in mechanism for enforcing security policies on pods. You should know the three levels (Privileged, Baseline, Restricted) and how to apply them via Namespace labels or Admission Controllers.

Trivy handles security scanning and SBOM generation:

Image scanning: Find vulnerabilities in container images
SBOM generation: Generate Software Bill of Materials in CycloneDX or SPDX format
Kubernetes scanning: Scan live Kubernetes resources for vulnerabilities, misconfigurations, and secrets
Filesystem scanning: Scan local files for vulnerabilities and misconfigurations

SBOM concepts:

An SBOM (Software Bill of Materials) is an inventory of all software components in a build
CycloneDX and SPDX are the two standard formats
Auditors use SBOMs to verify which dependencies are running and whether any have known vulnerabilities

Practice:

Follow the First steps with Trivy doc to install and scan a container image.

Using Policy Engines and Admission Controllers for Governance

What to know:

Kyverno uses stable policy types since v1 (as of Kyverno 1.12+). Know the difference between stable and legacy formats:

Stable policy types (use these on the exam):

ValidatingPolicy: Validates resources against rules using CEL expressions
MutatingPolicy: Automatically modify resources on creation
GeneratingPolicy: Create additional resources when a resource is created
DeletingPolicy: Automatically delete resources on a schedule (e.g., cleanup completed job pods)
ImageValidatingPolicy: Verify container images against signatures (Cosign, Notary)

Legacy format (may still appear in existing configs):

The typed rules format (validate, mutate, generate, delete) in the older ClusterPolicy kind is deprecated. The new format uses separate kinds for each policy type with CEL-based expressions.

OPA/Gatekeeper:

Gatekeeper uses a two-step process: ConstraintTemplate defines the policy logic (Rego) and schema, then Constraint applies it to specific resources.

ConstraintTemplate: Defines both the Rego enforcement logic and the parameter schema. The schema allows fine-tuning like function arguments.
Constraint: Instantiates a template with specific parameters and target selectors (namespaces, resource kinds).
Rego: Declarative policy language for violation logic.
Libraries: Rego packages can be shared across templates via the libs section in ConstraintTemplate.

Practice:

Follow the Kyverno installation guide then go through the Policy types documentation to create a ValidatingPolicy and MutatingPolicy.
Follow the Gatekeeper installation guide then go through some of the examples found in the Gatekeeper Library.

Integrating Security Scanning and Compliance Checks into Deployment Pipelines

What to know:

The key concept is to move scanning left without breaking delivery:

Scan images in CI before they’re pushed
Scan manifests and Kubernetes resources before promotion
Fail the pipeline on critical findings
Generate SBOMs as part of the build artifact

Practice:

This practice is a bit open-ended but try creating a Tekton Task or other workflow that incorporates tools mentioned above. For example, a Task that scans an image with Trivy and generates an SBOM:

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: security-scan
spec:
  params:
    - name: image
      type: string
  steps:
    - name: scan-image
      image: ghcr.io/aquasecurity/trivy:latest
      script: |
        trivy image --exit-code 1 --severity HIGH,CRITICAL $(params.image)
    - name: generate-sbom
      image: ghcr.io/aquasecurity/trivy:latest
      script: |
        trivy image --format spdx-json --output /tmp/sbom.spdx.json $(params.image)

Conclusion

Lot of material to cover, right? Don’t worry about memorizing everything, as mentioned above, you will have access to documentation during the exam. The key is to be comfortable navigating the docs and understanding the core concepts so you can quickly find what you need.

The CNPE exam is performance-based - you’ll execute real tasks in a Linux desktop environment for 120 minutes. The best preparation is hands-on practice with each tool. When I sit through performance-based exams, I typically spend the first few minutes reading through all the tasks to get a sense of what’s being asked, then I prioritize based on my strengths. If you can answer something quickly, do it and secure those points. For tougher tasks, read the question carefully, then start to systematically work through the problem. If you find yourself stuck in a troubleshooting loop, flag the task and move on - you don’t have time to get bogged down. Go work on another task, then come back with fresh eyes.

Hope this guide helps! The CNPE is challenging but fair - if you’ve built platform engineering systems in real life, the hands-on tasks will feel natural. Take your time, read each task carefully, and use the documentation links provided for you on each task.

Good luck, and let me know here or on LinkedIn if you have any questions or want to share your experience with the exam!

Peace ✌️

PAUL'S BLOG

Cloud Native Platform Engineer (CNPE) Exam Study Guide

Platform Architecture and Infrastructure - 15%

Applying Platform Architecture Best Practices for Networking, Storage, and Compute

Using Cost Management Solutions for Right-Sizing and Scaling

Optimizing Multi-Tenancy Resource Usage

GitOps and Continuous Delivery - 25%

Implementing GitOps Workflows for Application and Infrastructure Deployment

Building and Configuring CI/CD Pipelines Integrated with Kubernetes

Working with Argo Workflows for Platform Automation

Deploying Applications Using Progressive Delivery Strategies

Platform APIs and Self-Service Capabilities - 25%

Designing and Creating Custom Resource Definitions for Platform Services

Implementing Workflows for Self-Service Provisioning Using Platform APIs

Using Kubernetes Operators for Platform Automation and Integration

Using Automation Frameworks for Self-Service Provisioning

Observability and Operations - 20%

Implementing Monitoring, Alerting, Logging, and Tracing Solutions

Measuring and Improving Platform Efficiency Using Deployment Metrics and Performance Indicators

Diagnosing and Remediating Platform Issue and Incident Scenarios

Security and Policy Enforcement - 15%

Configuring Secure Service-to-Service Communication

Applying RBAC and Security Controls Across Platform Resources

Generating Audit Trails and Enforcing Policy Compliance (SBOM, Compliance Reports, etc.)

Using Policy Engines and Admission Controllers for Governance

Integrating Security Scanning and Compliance Checks into Deployment Pipelines

Conclusion

Additional Resources