Cases
Real problems, engineered solutions.
Platform engineering, zero-trust security, observability, and self-service infrastructure — from architecture to production.
Enterprise Zero-Trust K8s Lab
Production-grade, self-hosted Kubernetes platform with GitOps, SSO, secrets management, and observability.
SSH Management with Keycloak
Centrally managed SSH access using Keycloak group membership — no LDAP, no FreeIPA, works air-gapped.
Azure Storage Policy Automation
K8s-native Python service reconciling Azure Blob access policies and rotating SAS tokens into Key Vault.
Crossplane Self-Service Portal
Self-service portal abstracting Kubernetes and Crossplane behind a role-based wizard UI.
AI Observability Agents
Non-invasive AI overlay on OTel → ClickHouse → Grafana — anomaly detection, correlation, and LLM-powered incident reasoning.
Enterprise Zero-Trust Kubernetes Lab
A production-grade, self-hosted Kubernetes platform with GitOps, SSO, secrets management, observability, and cloud resource provisioning.
Challenge
Design and operate a realistic, enterprise-grade Kubernetes environment on bare-metal that demonstrates the full DevOps/GitOps lifecycle — from cluster bootstrapping to zero-trust access control.
Key Decisions
- Replaced MetalLB with Cilium LB IPAM — eBPF-native load balancing, fewer components.
- Separated Helm chart repos from values repos — one ApplicationSet deploys any chart to any cluster.
- K8s API server integrated with Authentik OIDC — RBAC driven by SSO group membership.
- OAuth2-Proxy as Traefik middleware — adds auth without modifying apps.
- ClickHouse over Loki/Mimir — SQL-native queries over high-cardinality telemetry.
Outcomes
- Fully automated GitOps: single git push deploys across all clusters.
- Zero standing credentials via Vault + External Secrets Operator.
- Complete cluster rebuild in under 30 minutes.
- Published as open-source training (CC BY-NC).
Technology Stack
SSH User Management with Keycloak
A lightweight, LDAP-free solution for centrally managing SSH access using Keycloak group membership — designed for air-gapped environments.
Problem
Traditional SSH access relies on FreeIPA/AD/LDAP. In air-gapped environments, Keycloak cannot serve as an LDAP directory. Challenge: centrally manage SSH keys and per-server access without a user directory.
Solution
Store SSH public keys as Keycloak user attributes. Use group membership for per-server access. A custom AuthorizedKeysCommand queries the Keycloak REST API at login time.
Key Decisions
- Event-driven provisioning via Keycloak webhooks; scheduled sync as fallback.
- ed25519 enforced via attribute validation regex.
- Each server reads its allowed group from a local config — script is generic.
Outcomes
- Centralized SSH management without FreeIPA/AD/LDAP.
- Revoking access is instant: remove user from group.
- Single script works across all managed servers.
Technology Stack
Azure Storage Policy Automation
A Kubernetes-native Python service that reconciles Azure Blob container access policies and rotates SAS tokens into Key Vault.
Problem
Dozens of storage containers required manually managed access policies and SAS tokens. Tokens expired silently, policies drifted, no audit trail.
Key Decisions
- AKS Workload Identity — no secrets in the cluster.
- Metadata-driven: policy_* container tags define desired state.
- K8s CronJob handles scheduling, retries, observability.
Outcomes
- Eliminated manual SAS token management subscription-wide.
- Policy drift auto-corrected on every run.
- All SAS tokens stored in Key Vault and consumed by workloads via ExternalSecrets.
Technology Stack
Crossplane Self-Service Infrastructure Portal
A Python-based self-service portal abstracting Kubernetes and Crossplane behind a role-based wizard UI.
Problem
Developers must understand XRDs, compositions, and K8s manifests to provision cloud resources. Every request flows through the platform team.
Solution
UI pushes Crossplane CRDs to a central job queue. Per-project Applier Agents pick from the queue and apply at controlled rate. State Poller reads XR status into cache.
Key Decisions
- Queue-based decoupling protects K8s API from bursts.
- Per-project Applier Agents: control plane by default, external cluster when compliance requires.
- Cache-first reads with event-triggered refresh for near-real-time status.
- Provider credentials encrypted at rest, injected server-side — never exposed in UI.
Outcomes
- Developers provision infra without K8s/Crossplane knowledge.
- Platform team bottleneck eliminated.
- Flexible: regulated teams run their own agent connecting to central queue.
Technology Stack
AI Observability Agents
A non-invasive AI overlay on OTel → ClickHouse → Grafana — anomaly detection, correlation, and LLM-powered incident reasoning with zero pipeline changes.
Problem
Observability stacks generate enormous signal volumes but require manual correlation. Alert fatigue is high, root cause analysis is slow, no business-context layer.
Key Decisions
- Read-only integration: zero changes to OTel, ClickHouse, or Grafana.
- LLM only for reasoning over structured events — never raw data or anomaly math.
- Each component is an independent K8s Deployment via event bus (NATS/Kafka).
- Pluggable dispatcher — alerting backends swapped without touching detection.
Outcomes
- Incidents enriched with root cause and business impact before paging.
- Cross-service correlation reduces alert noise.
- SLO/SLA-aware severity classification.
- Feedback loop for continuous accuracy improvement.